/*! \page mod_ingest_page Developing Ingest Modules \section ingest_modules_getting_started Getting Started This page describes how to develop ingest modules. It assumes you have already set up your development environment as described in \ref mod_dev_page. Ingest modules analyze data from a data source (e.g., a disk image or a folder of logical files). Each ingest module typically focus on a single, specific type of analysis. Every ingest module is part of a sequence of ingest modules in one or more ingest pipelines configured for an ingest job. An ingest job is the processing of a single data source by all of the modules in the pipelines for the job. There are two types of ingest modules: - Data-source-level ingest modules - File-level ingest modules Here are some guidelines for deciding the type of your ingest module: - Your module should be a data-source-level ingest module if it only needs to retrieve and analyze a small subset of the files present in a data source. For example, a Windows registry analysis module that only processes registry hive files should be implemented as a data-source-level ingest module. - Your module should be a file-level ingest module if it examines most of the files from a data source, one file at a time. For example, a hash look up module might process every file system file by looking up its hash in one or more known file and known bad files hash sets (hash databases). As you will learn a little later in this guide, it is possible to package a data-source-level ingest module and a file-level ingest module together. The modules in such a pair will be enabled or disabled together and will have common per ingest job and global settings. The following sections of this page delve into what you need to know to develop your own ingest modules: - \ref ingest_modules_implementing_ingestmodule - \ref ingest_modules_implementing_datasourceingestmodule - \ref ingest_modules_implementing_fileingestmodule - \ref ingest_modules_services - \ref ingest_modules_making_results - \ref ingest_modules_implementing_ingestmodulefactory - \ref ingest_modules_pipeline_configuration - \ref ingest_modules_api_migration You may also want to look at the org.sleuthkit.autopsy.ingest.example package to see a sample of each type of module. The sample modules don't do anything particularly useful, but they can serve as templates for developing your own ingest modules. \section ingest_modules_implementing_ingestmodule Implementing the IngestModule Interface All ingest modules, whether they are data source or file ingest modules, must implement the two methods defined by the org.sleuthkit.autopsy.ingest.IngestModule interface: - org.sleuthkit.autopsy.ingest.IngestModule.startUp() - org.sleuthkit.autopsy.ingest.IngestModule.shutDown() The startUp() method is invoked by Autopsy when it starts up the ingest pipeline of which the module instance is a part. This gives your ingest module instance an opportunity to set up any internal data structures and acquire any private resources it will need while doing its part of the ingest job. The module instance probably needs to store a reference to the org.sleuthkit.autopsy.ingest.IngestJobContext object that is passed to startUp(). The job context provides data and services specific to the ingest job and the pipeline. If an error occurs during startUp(), the module should throw an org.sleuthkit.autopsy.ingest.IngestModule.IngestModuleException object. If a module instance throws an exception, it will be immediately discarded, so clean up for exceptional conditions should occur within startUp(). The shutDown() method is invoked by Autopsy when an ingest job is completed or canceled and it is shutting down the pipeline of which the module instance is a part, just before the ingest module instance is discarded. The module should respond by doing things like releasing private resources, and if the job was not canceled, posting final results to the blackboard and perhaps submitting a final message to the user's ingest messages in box (see \ref ingest_modules_making_results). As a module developer, it is important for you to realize that Autopsy will generally use several instances of an ingest module for each ingest job it performs. In fact, an ingest job may be processed by multiple pipelines using multiple worker threads. However, you are guaranteed that there will be exactly one thread executing code in any module instance, so you may freely use unsynchronized, non-volatile instance variables. On the other hand, if your module instances must share resources through static class variables or other means, you are responsible for synchronizing access to the shared resources and doing reference counting as required to release those resources correctly. Also, more than one ingest job may be in progress at any given time. This must be taken into consideration when sharing resources or data that may be specific to a particular ingest job. You may want to look at the sample ingest modules in the org.sleuthkit.autopsy.ingest.example package to see a simple example of sharing per ingest job state between module instances. For your convenience, an ingest module that does not require initialization and/or clean up may extend the abstract org.sleuthkit.autopsy.ingest.IngestModuleAdapter class to get default "do nothing" implementations of these methods. \section ingest_modules_implementing_datasourceingestmodule Creating a Data Source Ingest Module To create a data source ingest module, make a new Java class either manually or using the NetBeans wizards. Make the class implement org.sleuthkit.autopsy.ingest.DataSourceIngestModule and optionally make it extend org.sleuthkit.autopsy.ingest.IngestModuleAdapter. The NetBeans IDE will complain that you have not implemented one or more of the required methods and you can use its "hints" to automatically generate stubs for them. Use this page and the documentation for the org.sleuthkit.autopsy.ingest.IngestModule and org.sleuthkit.autopsy.ingest.DataSourceIngestModule interfaces for guidance on what each method needs to do. Or you can copy the code from org.sleuthkit.autopsy.examples.SampleDataSourceIngestModule and use it as a template for your module. The sample module does not do anything particularly useful, but it should provide a skeleton for you to flesh out with your own code. All data source ingest modules must implement the single method defined by the org.sleuthkit.autopsy.ingest.DataSourceIngestModule interface: - org.sleuthkit.autopsy.ingest.DataSourceIngestModule.process() The process() method is where all of the work of a data source ingest module is done. It will be called exactly once between startUp() and shutDown(). The process() method receives a reference to an org.sleuthkit.datamodel.Content object and an org.sleuthkit.autopsy.ingest.DataSourceIngestModuleStatusHelper object. The former is a representation of the data source. The latter should be used by the module instance to be a good citizen within Autopsy as it does its potentially long-running processing. Here is a code snippet showing the skeleton of a well-behaved data source ingest module process() method: \code @Override public ProcessResult process(Content dataSource, DataSourceIngestModuleStatusHelper statusHelper) { // In this case, we know the exact number analysis tasks to be done, so we use // the status helper to set the the progress bar to determinate and to set the // remaining number of work units to be completed. final int totalSubTasks = 12; statusHelper.switchToDeterminate(totalSubTasks); for(int subTask = 0; subTask < totalSubTasks; ++subTask) { // Do our part in fulfilling ingest job cancellation requests by the user. if (statusHelper.isIngestJobCancelled()) { break; } // Do a unit of work. try { // Analysis task may post blackboard artifacts, submit messages to // the ingest in box, etc. performSubTask(i); } catch (Exception ex) { Logger logger = IngestServices.getInstance().getLogger(MODULE_NAME); logger.log(Level.SEVERE, "Exception occurred while performing sub-task " + subTask, ex); return IngestModule.ProcessResult.ERROR; } // Update progress. statusHelper.progress(i + 1); } return IngestModule.ProcessResult.OK; } \endcode Note that data source ingest modules must find the files that they want to analyze. The best way to do that is using one of the findFiles() methods of the org.sleuthkit.autopsy.casemodule.services.FileManager class. See \ref mod_dev_other_services for more details. \section ingest_modules_implementing_fileingestmodule Creating a File Ingest Module To create a file ingest module, make a new Java class either manually or using the NetBeans wizards. Make the class implement org.sleuthkit.autopsy.ingest.FileIngestModule and optionally make it extend org.sleuthkit.autopsy.ingest.IngestModuleAdapter. The NetBeans IDE will complain that you have not implemented one or more of the required methods and you can use its "hints" to automatically generate stubs for them. Use this page and the documentation for the org.sleuthkit.autopsy.ingest.IngestModule and org.sleuthkit.autopsy.ingest.FileIngestModule interfaces for guidance on what each method needs to do. Or you can copy the code from org.sleuthkit.autopsy.examples.SampleFileIngestModule and use it as a template for your module. The sample module does not do anything particularly useful, but it should provide a skeleton for you to flesh out with your own code. All data source ingest modules must implement the single method defined by the org.sleuthkit.autopsy.ingest.FileIngestModule interface: - org.sleuthkit.autopsy.ingest.FileIngestModule.process() The process() method is where all of the work of a file ingest module is done. It will be called repeatedly between startUp() and shutDown(), once for each file Autopsy feeds into the pipeline of which the module instance is a part. The process() method receives a reference to a org.sleuthkit.datamodel.AbstractFile object. \section ingest_modules_services Ingest Services The singleton instance of the org.sleuthkit.autopsy.ingest.IngestServices class provides services tailored to the needs of ingest modules, and a module developer should use these utilities to log errors, send messages, get the current case, fire events, persist simple global settings, etc. Refer to the documentation of the IngestServices class for method details. \section ingest_modules_making_results Posting Ingest Module Results Ingest modules run in the background. There are three ways to send messages and save results so that the user can see them: - Use the blackboard for long-term storage of analysis results. These results will be displayed in the results tree. - Use the ingest messages in box to notify users of high-value analysis results that were also posted to blackboard. - Use the logging and/or message box utilities to for error messages. \subsection ingest_modules_making_results_bb Posting Results to Blackboard The blackboard is used to store results so that they are displayed in the results tree. See \ref platform_blackboard for details on posting results to it. The blackboard defines artifacts for specific data types (such as web bookmarks). You can use one of the standard artifact types, create your own, or simply post text as a org.sleuthkit.datamodel.BlackboardArtifact.ARTIFACT_TYPE.TSK_TOOL_OUTPUT artifact. The latter is much easier (for example, you can simply copy in the output from an existing tool), but it forces the user to parse the output themselves. When modules add data to the blackboard, they should notify listeners of the new data by invoking the IngestServices.fireModuleDataEvent() method. Do so as soon as you have added an artifact to the blackboard. This allows other modules (and the main UI) to know when to query the blackboard for the latest data. However, if you are writing a large number of blackboard artifacts in a loop, it is better to invoke IngestServices.fireModuleDataEvent() only once after the bulk write, so as not to flood the system with events. \subsection ingest_modules_making_results_inbox Posting Results to Message Inbox Modules should post messages to the inbox when interesting data is found. Of course, such data should also be posted to the blackboard. The idea behind the ingest messages is that they are presented in chronological order so that users can see what was found while they were focusing on something else. Inbox messages should only be sent if the result has a low false positive rate and will likely be relevant. For example, the core Autopsy hash lookup module sends messages if known bad (notable) files are found, but not if known good (NSRL) files are found. This module provides also provides a global setting (using its global settings panel) that allows a user to turn these messages on or off. Messages are created using the org.sleuthkit.autopsy.ingest.IngestMessage class and posted to the inbox using org.sleuthkit.autopsy.ingest.IngestServices.postMessage() method. \subsection ingest_modules_making_results_error Reporting Errors When an error occurs, you could send an error message to the ingest inbox. The downside of this is that the ingest inbox was not really designed for this purpose and it is easy for the user to miss these messages. Therefore, it is preferable to post a pop-up message that is displayed in the lower right hand corner of the main window by calling org.sleuthkit.autopsy.coreutils.MessageNotifyUtil.Notify.show(). \section ingest_modules_implementing_ingestmodulefactory Creating an Ingest Module Factory Autopsy uses ingest module factories used to create data source ingest module instances, file ingest module instances, or instances of both types of ingest modules for each ingest job and its pipelines. An ingest module factory may provide global and per ingest job settings user interface panels. The global settings should apply to all module instances. The per ingest job settings should apply to all module instances working on a particular ingest job. Autopsy supports context-sensitive and persistent per ingest job settings, so per ingest job settings must be serializable. During ingest job configuration, Autopsy bundles the ingest module factory with the ingest job settings specified by the user and expects the ingest factory to be able to create any number of module instances for an ingest job, using the ingest job settings. This implies that the constructors of ingest modules that have per ingest job settings must accept ingest job settings arguments. You must also provide a mechanism for ingest module instances to access global settings, should you choose to have global settings. For example, the Autopsy core hash look up and keyword search modules come with singleton managers of resources such as hash databases and keyword search lists, respectively. The factory is responsible for persisting global settings and may use the module settings methods provided by org.sleuthkit.autopsy.ingest.IngestServices for saving simple properties, or the facilities of classes such as org.sleuthkit.autopsy.coreutils.PlatformUtil and org.sleuthkit.autopsy.coreutils.XMLUtil for more sophisticated approaches. To be discovered at runtime by the ingest framework, IngestModuleFactory implementations must be marked with the following NetBeans Service provider annotation: - "@ServiceProvider(service = IngestModuleFactory.class)" The following pacakge import is required for the ServiceProvider annotation: - import org.openide.util.lookup.ServiceProvider You will also need to add a dependency on the Lookup API NetBeans module to your NetBeans module to use this import. Compared to the DataSourceIngestModule and FileIngestModule interfaces, the IngestModuleFactory is richer, but also more complex. For your convenience, an ingest module factory that does not require a full-implementation of all of the factory features may extend the abstract org.sleuthkit.autopsy.ingest.IngestModuleFactoryAdapter class to get default "do nothing" implementations of most of the methods in the IngestModuleFactory interface. If you do need to implement the full interface, use the documentation for the following classes as a guide: - org.sleuthkit.autopsy.ingest.IngestModuleFactory - org.sleuthkit.autopsy.ingest.IngestModuleGlobalSettingsPanel - org.sleuthkit.autopsy.ingest.IngestModuleIngestJobSettings - org.sleuthkit.autopsy.ingest.IngestModuleIngestJobSettingsPanel You can also refer to sample implementations of the interfaces and abstract classes in the org.sleuthkit.autopsy.examples package, although you should note that the samples do not do anything particularly useful. \section ingest_modules_pipeline_configuration Ordering of Ingest Modules in Ingest Pipelines By default, ingest modules that are not part of the standard Autopsy installation will run after the core ingest modules. No order is implied. This will likely change in the future, but currently manual configuration is needed to enforce sequencing of ingest modules. There is an ingest pipeline configuration XML file that specifies the order for running the core ingest modules. If you need to insert your ingest modules in the sequence of core modules or control the ordering of non-core modules, you must edit this file by hand. You will find it in the config directory of your Autopsy installation, typically something like "C:\Users\yourUserName\AppData\Roaming\.autopsy\dev\config\pipeline_config.xml" on a Microsoft Windows platform. Check the Userdir listed in the Autopsy About dialog. Autopsy will provide tools for reconfiguring the ingest pipeline in the near future. Until that time, there is no guarantee that the schema of this file will remain fixed and that it will not be overwritten when upgrading your Autopsy installation. \section ingest_modules_api_migration Migrating Ingest Modules from the 3.0.X API */