autopsy-flatpak/docs/doxygen/modIngest.dox

/*! \page mod_ingest_page Developing Ingest Modules


\section ingestmodule_modules Ingest Module Basics

This section tells you how to make an Ingest Module.  Ingest modules
analyze data from a data source (a disk image or set of logical
files).  They typically focus on a specific type of data analysis.
The modules are loaded each time that Autopsy starts.  The user can
choose to enable each module when they add an image to the case.
It assumes you have already setup your development environment as
described in \ref mod_dev_page.

First, you need to choose the type of Ingest Module.

- Data Source-level modules are passed in a reference to a top-level data source, such as an Image or folder of logical files.
These modules may query the database for a small set of specific files. For example, a Windows registry module that runs on the hive files.  It is interested in only a small subset of the hard drive files.

- File-level modules are passed in a reference to each file.
The Ingest Manager chooses which files to pass and when.
These modules are intended to analyze most of the files on the system
For example, a hash calculation module that reads in the content of every file.


Refer to org.sleuthkit.autopsy.ingest.example for sample source code of dummy modules.

\section ingest_common Commonalities

There are several things about these module types that are common and we'll outline those here.  For both modules, you will extend an interface and implement some methods.

Refer to the documentation for each method for its use.
- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.init() is invoked when an ingest session starts.
- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.complete() is invoked when an ingest session completes.
- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.stop() is invoked on a module when an ingest session is interrupted by the user or system.
- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.getName() returns the name of the module.
- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.getDescription() returns a short description of the module.
- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.getVersion() returns the version of the module.


The process() method is invoked to analyze the data. This is where
the analysis is done. The specific method depends on the module
type; it is passed either a data source or a file to process.  We'll
cover this in later sections.  This method will post results to the
blackboard and with inbox messages to the user.


\section ingest_datasrc Data Source-level Modules

To make a data source-level module, make a new Java class either manually or using the NetBeans wizards. Edit the class to extend "org.sleuthkit.autopsy.ingest.IngestModuleDataSource". NetBeans will likely complain that you have not implemented the necessary methods and you can use its "hints" to automatically generate stubs for them. Use the documentation for the org.sleuthkit.autopsy.ingest.IngestModuleDataSource class for details on what each needs to do.

Example snippet of an ingest-level module process() method:

\code
@Override
public void process(Content dataSource, IngestDataSourceWorkerController controller) {

    //we have some number workunits / sub-tasks to execute
    //in this case, we know the number of total tasks in advance
    final int totalTasks = 12;

    //initialize the overall image ingest progress
    controller.switchToDeterminate();
    controller.progress(totalTasks);

    for(int subTask = 0; subTask < totalTasks; ++subTask) {
        //add cancellation support
        if (controller.isCancelled() ) {
            break; // break out early to let the thread terminate
        }

         //do the work
        try {
            //sub-task may add blackboard artifacts and create an inbox message
            performSubTask(i);
        } catch (Exception ex) {
            logger.log(Level.WARNING, "Exception occurred in subtask " + subTask, ex);
        }

        //update progress
        controller.progress(i+1);
    }
}
\endcode


\section ingest_file File-level Modules

To make a File-level module, make a new Java class either manually or using the NetBeans wizards. Edit the class to extend "org.sleuthkit.autopsy.ingest.IngestModuleAbstractFile". NetBeans will likely complain that you have not implemented the necessary methods and you can use its "hints" to automatically generate stubs for them. Use the method documentation in the org.sleuthkit.autopsy.ingest.IngestModuleAbstractFile class to fill in the details.

Unlike Data Source-level modules, file-level modules are singletons.  Only a single instance is created for all files.
The same file-level module instance will be used for files in different images and even different cases if new cases are opened.

Every file-level module should support multiple init() -> process() -> complete(), and init() -> process() -> stop() invocations.  It should also support init() -> complete() sequences.  A new case could be open for each call of init().

Currently (and this is likely to change in the future), File-level ingest modules are Singletons (meaning that only a single instance is created for the runtime of Autopsy).
You will need to implement a public static getDefault() method that returns a static instance of the module.  Note that if you skip this step, you will not see an error until Autopsy tries to load your module and the log will say that it does not have a getDefault method.

The implementation of this method is very standard, example:

\code
public static synchronized MyIngestModule getDefault() {

   //defaultInstance is a private static class variable
   if (defaultInstance == null) {
        defaultInstance = new MyIngestModule();
   }
   return defaultInstance;
}
\endcode


You should also make the constructor private to ensure the singleton status.

As a result of the singleton design, init() will be called multiple times and even for different cases.  Ensure that you update local member variables accordingly each time init() is called.  Again, this design will likely change, but it is what it is for now.


\subsection ingestmodule_making_registration Module Registration

Modules are automatically discovered if they implement the proper interface.
Currently, a restart of Autopsy is required after a module is installed before it is discovered.


\subsubsection ingestmodule_making_registration_order Module Order

By default, modules that do not come with a standard Autopsy installation will run after the standard modules. No order
is implied. This design will likely change in the future, but currently manual configuration is needed to enforce order.


There is an XML pipeline configuration that contains the standard modules and specifies the order that they are run in.
If you need to specify the order of modules, then they needed to be manually addded to this file in the correct order.
This file is the same format as The Sleuth Kit Framework configuration file.
Refer to http://sleuthkit.org/sleuthkit/docs/framework-docs/pipeline_config_page.html which is an official documentation
for the pipeline configuration schema.

Autopsy will provide tools for reconfiguring the ingest pipeline in the near future,
and user/developer will be able to reload current view of discovered modules,
reorder modules in the pipeline and set their arguments using GUI.


\subsection ingestmodule_services Ingest Services

Class org.sleuthkit.autopsy.ingest.IngestServices provides services specifically for the ingest modules
and a module developer should use these utilities to send messages, get current case, etc.  Refer to its documentation for method details.

Remember, update references to IngestServices and Cases with each call to init() inside of the module.

Module developers are encouraged to use Autopsy's org.sleuthkit.autopsy.coreutils.Logger
infrastructure to log errors to the Autopsy log.
The logger can also be accessed using the org.sleuthkit.autopsy.ingest.IngestServices class.

Certain modules may need need a persistant store (other than for storing results) for storing and reading
module configurations or state.
The ModuleSettings API can be used also via org.sleuthkit.autopsy.ingest.IngestServices class.


\subsection ingestmodule_making_results Posting Results

Ingest modules run in the background.  There are three ways to send messages and save results:
- Blackboard for long-term storage of analysis results and display in the results tree.
- Ingest Inbox to notify user of high-value analysis results that were also posted to blackboard.
- Error messages.

\subsubsection ingestmodule_making_results_bb Posting Results to Blackboard
The blackboard is used to store results so that they are displayed in the results tree.  See \ref platform_blackboard  for details on posting results to it.

When modules add data to the blackboard,
modules should notify listeners of the new data by
invoking IngestServices.fireModuleDataEvent() method.
Do so as soon as you have added an artifact to the blackboard.
This allows other modules (and the main UI) to know when to query the blackboard for the latest data.
However, if you are writing a larger number of blackboard artifacts in a loop, it is better to invoke
IngestServices.fireModuleDataEvent() only once after the bulk write, not to flood the system with events.

\subsubsection ingestmodule_making_results_inbox Posting Results to Message Inbox

Modules should post messages to the inbox when interesting data is found that has also been posted to the blackboard.
The idea behind these messages are that they are presented in chronological order so that users can see what was
found while they were focusing on something else.


These messages should only be sent if the result has a low false positive rate and will likely be relevant.
For example, the hash lookup module will send messages if known bad (notable) files are found,
but not if known good (NSRL) files are found.  You can provide options to the users on when to make messages.


A single message includes the module name, message subject, message details,
a unique message id (in the context of the originating module), and a uniqueness attribute.
The uniqueness attribute is used to group similar messages together
and to determine the overall importance priority of the message
(if the same message is seen repeatedly, it is considered lower priority).

For example, for a keyword search module, the uniqueness attribute would the keyword that was hit.

Messages are created using the org.sleuthkit.autopsy.ingest.IngestMessage class and posted
using methods in \ref ingestmodule_services.


\subsection ingestmodule_making_configuration Module Configuration

Ingest modules may require user configuration. In \ref mod_dev_adv_options, you learned about Autopsy-wide settings.  There are some
settings that are specific to ingest modules as well.

The framework
supports two levels of configuration: simple and advanced. Simple settings enable the user to enable and disable basic things at run-time (using check boxes and such).
Advanced settings require more in-depth configuration with more powerful interface.

As an example, the advanced configuration for the keyword search module allows you to add and create keyword lists, choose encodings, etc. The simple interface allows
you to enable and disable lists.

Module configuration is module-specific: every module maintains its own configuration state and is responsible for implementing the graphical interface.
If a module needs simple or advanced configuration, it needs to implement methods in its interface.
The org.sleuthkit.autopsy.ingest.IngestModuleAbstract.hasSimpleConfiguration(),
org.sleuthkit.autopsy.ingest.IngestModuleAbstract.getSimpleConfiguration(), and org.sleuthkit.autopsy.ingest.IngestModuleAbstract.saveSimpleConfiguration()
methods should be used for simple configuration.  This panel will be shown when the user chooses which ingest modules to enable.

The advanced configuration is implemented with the
org.sleuthkit.autopsy.ingest.IngestModuleAbstract.hasAdvancedConfiguration(),
org.sleuthkit.autopsy.ingest.IngestModuleAbstract.getAdvancedConfiguration(), and
org.sleuthkit.autopsy.ingest.IngestModuleAbstract.saveAdvancedConfiguration()
methods. This panel can be accessed from the "Advanced" button when the user chooses which ingest modules to enable.
It is recommended that the advanced panel be the same panel that is used in the Options area (see  \ref mod_dev_adv_options).

Refer to \ref mod_dev_adv_properties for details on saving properties from these panels.


<!-- @@@  MOVE THIS TO ADVANED -- I"M NOT SURE WHO NEEDS THIS..
\section ingestmodule_events Getting Ingest Status and Events

NOTE: Sync this up with \ref mod_dev_events.

Other modules and core Autopsy classes may want to get the overall ingest status from the ingest manager.
The IngestManager handle is obtained using org.sleuthkit.autopsy.ingest.IngestManager.getDefault().
The manager provides access to ingest status with the
org.sleuthkit.autopsy.ingest.IngestManager.isIngestRunning() method and related methods
that allow to query ingest status per specific module.

External modules (such as data viewers) can also register themselves as ingest module event listeners
and receive event notifications (when a module is started, stopped, completed or has new data).
Use the IngestManager.addPropertyChangeListener() method to register a module event listener.
Events types received are defined in IngestManager.IngestModuleEvent enum.

At the end of the ingest, IngestManager itself will notify all listeners of IngestModuleEvent.COMPLETED event.
The event is an indication for listeners to perform the final data refresh by quering the blackboard.
Module developers are encouraged to generate periodic IngestModuleEvent.DATA
ModuleDataEvent events when they post data to the blackboard,
but the IngestManager will make a final event to handle scenarios where the module did not notify listeners while it was running.
-->

*/