autopsy-flatpak/docs/doxygen/modIngest.dox

/*! \page mod_ingest_page Developing Ingest Modules


\section ingestmodule_modules Ingest Module Basics

Ingest modules analyze data from a disk image.
They typically focus on a specific type of data analysis.
The modules are loaded each time that Autopsy starts.
The user can choose to enable each module when they add an image to the case.

There are two types of ingest modules.
- Image-level modules are passed in a reference to an image and perform general analysis on it.
These modules may query the database for a small set of specific files, thus are generally quicker to execute.

- File-level modules are passed in a reference to each file.
The Ingest Manager chooses which files to pass and when.
These modules are intended to analyze most of the files on the system
or that want to examine the file content of all files
(i.e. to detect file type based on signature instead of file extension).

An example of Image-level module is a registry module, that extracts only the registry files.
An example of File-level module is a search module, that indexes and searches content of every file.

Modules post their results to the \ref platform_blackboard
(@@@ NEED REFERENCE FOR THIS -- org.sleuthkit.datamodel)
and can query the blackboard to get the results of previous modules.
For example, the hash database lookup module may want to query for a previously calculated hash value.


\section ingestmodule_relevant_api Relevant APIs

Relevant APIs to a ingest module writer are:

- The org.sleuthkit.autopsy.ingest.IngestServices class, which contains methods to post ingest results and
to get access to services in the framework (Case and blackboard, logging, configuration , and others).
- Interfaces org.sleuthkit.autopsy.ingest.IngestModuleAbstractFile and org.sleuthkit.autopsy.ingest.IngestModuleImage, one of which needs to be implemented by the module.
There is also a parent interface, org.sleuthkit.autopsy.ingest.IngestModuleAbstract, common to all ingest modules.
- Additional utilities in the Autopsy \ref org.sleuthkit.autopsy.coreutils module for getting information about the platform,
versions and for file operations.


\section ingestmodule_making Making Ingest Modules

First, you will need to setup your environment and
make a new Autopsy module using Netbeans IDE.  Refer to \ref mod_dev_page for general instructions.

Refer to org.sleuthkit.autopsy.ingest.example for sample source code of dummy modules.

\subsection ingestmodule_making_api Module Interface

The next step is to choose the correct module type and implement the correct interface.

Image-level modules will implement the org.sleuthkit.autopsy.ingest.IngestModuleImage interface
and file-level modules will implement the org.sleuthkit.autopsy.ingest.IngestModuleAbstractFile interface.


The interfaces have several standard methods that need to be implemented:

- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.init() is invoked every time an ingest session starts by the framework.
A module should support multiple invocations of init() throughout the application life-cycle.

In this method, module should reinitialize its internal objects and resources and get them ready
for a brand new ingest processing.

At minimum, the module should get a handle to the ingest services using
org.sleuthkit.autopsy.ingest.IngestServices.getDefault().
If the module posts results, then it needs the blackboard API; the module should get
an updated handle to the current Case database (and the blackboard API)
using org.sleuthkit.autopsy.ingest.IngestServices.getCurrentSleuthkitCaseDb().
This gives access to the underlying datamodel for the current Case.
Make sure not to store stale handles to Case objects across ingests,
 because Case can likely be changed for the next ingest.


- org.sleuthkit.autopsy.ingest.IngestModuleAbstract.complete() is invoked when an ingest session completes.
The module should perform any resource (files, handles, caches) cleanup in this method
and submit final results and post a final ingest inbox message, using org.sleuthkit.autopsy.ingest.IngestServices.

-  org.sleuthkit.autopsy.ingest.IngestModuleAbstract.stop() is invoked on a module when an ingest session is interrupted by the user or by the system.
The method implementation should be similar to complete() in that the module should perform any cleanup work.
If there is pending data to be processed or pending results to be reported by the module
then the results should be rejected and ignored if stop()
is invoked and the method should return as early as possible.

The main difference between complete() and stop() is that stop() does not need to worry about
finalizing and reporting final results, it can simply reject them.
The quicker it acts, the easier it will for the application to transition to the next state.

- process() method is invoked to analyze the data. The specific method depends on the module type; it is passed
either an Image or a File to process.

\subsubsection ingestmodule_making_process Process Method
The process method is where the work is done in each type of module, on the Image of File passed as an argument.

Once a result is found, in this method the module should post result to the blackboard
and it could post an inbox message to the user.

Some notes:
- File-level modules will be called on each file in an order determined by the org.sleuthkit.autopsy.ingest.IngestManager.
Each module is free to quickly ignore a file based on name, signature, etc.

- If a module wants to know the return value from a previously run module on this file,
it should use the org.sleuthkit.autopsy.ingest.IngestServices.getAbstractFileModuleResult() method.

- Image-level modules are expected not passed in specific files and are expected to query the database
to find the files that they are interested in.   They can use the org.sleuthkit.datamodel.SleuthkitCase object handle (initialized in the init() method) to query the database.

- File-level module could be passed in files from different images in consecutive calls to process().


\subsubsection ingestmodule_making_process_controller Image Ingest Controller (Image-level modules only)

Image-level modules receive in org.sleuthkit.autopsy.ingest.IngestModuleImage.process() method
get a reference to org.sleuthkit.autopsy.ingest.IngestImageWorkerController object.

They should use this object to:

- report progress (number of work units processed),
- add thread cancellation support.

Example snippet of an ingest-level module process() method:


\code
@Override
public void process(Image image, IngestImageWorkerController controller) {

    //we have some number workunits / sub-tasks to execute
    //in this case, we know the number of total tasks in advance
    final int totalTasks = 12;

    //initialize the overall image ingest progress
    controller.switchToDeterminate();
    controller.progress(totalTasks);

    for(int subTask = 0; subTask < totalTasks; ++subTask) {
        //add cancellation support
        if (controller.isCancelled() ) {
            break; // break out early to let the thread terminate
        }

         //do the work
        try {
            //sub-task may add blackboard artifacts and create an inbox message
            performSubTask(i);
        } catch (Exception ex) {
            logger.log(Level.WARNING, "Exception occurred in subtask " + subTask, ex);
        }

        //update progress
        controller.progress(i+1);
    }
}
\endcode


\subsection ingestmodule_module_lifecycle Module Lifecycle

File-level modules are singletons and are long-lived and image-level modules are short lived.
The same file-level module instance can be invoked again by the org.sleuthkit.autopsy.ingest.IngestManager when more work is enqueued or
when ingest is restarted.  The same file-level ingest module instance can also be invoked for different Cases or files from different images in the Case.

Every file-level module should support multiple init() - process() - complete(), and init() - process() - stop() invocations.

New instances of image-level modules will be created when the second image is added.
Therefore, image-level modules can assume that the process() method will be called at most once after init() is called.

Every module (file or image) should also support multiple init() - complete() and init() - stop() invocations,
which can occur if ingest pipeline is started but no work is enqueued for the particular module.

\subsection ingestmodule_additional_method Additional Methods to Implement

Besides methods defined in the interfaces, you will need to implement
a public static getDefault() method (both for an image or a file module), which needs
to return a static instance of the module.  This is required for proper module registration.

Presence of that method is validated when the module is loaded, and the module will fail validation
and will not be loaded if the method is not implemented.

The implementation of this method is very standard, example:

\code
public static synchronized MyIngestModule getDefault() {

   //defaultInstance is a private static class variable
   if (defaultInstance == null) {
        defaultInstance = new MyIngestModule();
   }
   return defaultInstance;
}
\endcode


File-level modules need to be singleton.  To ensure this, make the constructor private.

Image-level modules require a public constructor.


\subsection ingestmodule_making_registration Module Registration

Modules are automatically discovered and registered when \ref mod_dev_plugin "added as a plugin to Autopsy".
Currently, a restart of Autopsy is required for the newly discovered ingest module to be fully registered and functional.
All you need to worry about is to implement the ingest module interface and the required methods and the module will be
automatically discovered by the framework.

\subsubsection ingestmodule_making_registration_pipeline_config Pipeline Configuration

Autopsy maintains an ordered list of autodiscovered modules.  The order of a module in the pipeline determines
when it will be run relative to other modules.  The order can be important for some modules and it can be adjusted.

When a module is installed using the plugin installer, it is automatically discovered and validated.
The validation process ensures that the module meets the inter-module dependencies, implements the right interfaces and methods
and meets constrains enforced by the schema.  If the module is valid, it is registered and added to the right pipeline
based on the interfaces it implements, and the pipeline configuration is updated with the new module.
If the module is not already known to the pipeline configuration, it gets the default order assigned --
 it is added to the end of the pipeline.

If the module is invalid, it will still be added to the pipeline configuration, but it will not be instantiated.
It will be re-validated again next time the module discovery runs.  The validation process logs its results to the main
Autopsy log files.

The pipeline configuration is an XML file with currently discovered modules
and the modules that were discovered in the past but are not currently loaded.
Autopsy maintains that file in the Autopsy user configuration directory.
The XML file defines the module location, its place in the ingest pipeline, along with optional configuration arguments.
The example of the pipeline configuration is given below:

\code
<PIPELINE_CONFIG>
    <PIPELINE type="FileAnalysis">
      <MODULE order="1" type="plugin" location="org.sleuthkit.autopsy.hashdatabase.HashDbIngestModule" arguments="" />
      <MODULE order="2" type="plugin" location="org.sleuthkit.autopsy.exifparser.ExifParserFileIngestModule"/>
    </PIPELINE>

    <PIPELINE type="ImageAnalysis">
      <MODULE order="1" type="plugin" location="org.sleuthkit.autopsy.recentactivity.RAImageIngestModule" arguments=""/>
    </PIPELINE>
</PIPELINE_CONFIG>
\endcode

Refer to http://sleuthkit.org/sleuthkit/docs/framework-docs/pipeline_config_page.html which is an official documentation
for the pipeline configuration schema.

The pipeline configuration file should not be directly edited by a regular user,
but it can be edited by a developer to test their module.

Autopsy will provide tools for reconfiguring the ingest pipeline in the near future,
and user/developer will be able to reload current view of discovered modules,
reorder modules in the pipeline and set their arguments using GUI.


\subsection ingestmodule_using_services Using Ingest Services

Class org.sleuthkit.autopsy.ingest.IngestModuleServices provides services specifically for the ingest modules
and a module developer should use these utilities.

The class has methods for ingest modules to:
- send ingest messages to the inbox,
- send new data events to registered listeners (such as viewers),
- check for errors in the file-level pipeline,
- getting access to org.sleuthkit.datamodel.SleuthkitCase database and the blackboard,
- getting loggers,
- getting and setting module settings.

To use it, a handle to org.sleuthkit.autopsy.ingest.IngestServices singleton object should be initialized
 in the module's \c init() method.  For example:

\code
services = IngestServices.getDefault()
\endcode

It is safe to store handle to services in your module private member variable.

However, DO NOT initialize the services handle statically in the member variable declaration.
Use the init() method to do that to ensure proper order of initialization of objects in the platform.

Module developers are encouraged to use Autopsy's org.sleuthkit.autopsy.coreutils.Logger
infrastructure to log errors to the Autopsy log.
The logger can also be accessed using the org.sleuthkit.autopsy.ingest.IngestServices class.

Certain modules may need need a persistant store (other than for storing results) for storing and reading
module configurations or state.
The ModuleSettings API can be used also via org.sleuthkit.autopsy.ingest.IngestServices class.


\subsection ingestmodule_making_results Posting Results

<!-- @@@ -->
NOTE: This needs to be made more in sync with the \ref platform_blackboard.  This section (or one near this) needs to be the sole source of ingest inbox API information.

Users will see the results from ingest modules in one of two ways:
- Results are posted to the blackboard and will be displayed in the navigation tree
- For selected results, messages are sent to the Ingest Inbox to notify a user of what has recently been found.

\subsubsection ingestmodule_making_results_bb Posting Results to Blackboard

See the Blackboard (REFERENCE) documentation for posting results to it.

Modules are free to immediately post results when they find them
or they can wait until they are ready to post the results or until ingest is done.

An example of waiting to post results is the keyword search module.
It is resource intensive to commit the keyword index and do a keyword search.
Therefore, when its process() method is invoked,
it checks if it is internal timer and result posting frequency setting
to check if it is close to it since the last time it did a keyword search.
If it is, then it commits the index and performs the search.

When modules add data to the blackboard,
modules should notify listeners of the new data by
invoking IngestServices.fireModuleDataEvent() method.
Do so as soon as you have added an artifact to the blackboard.
This allows other modules (and the main UI) to know when to query the blackboard for the latest data.
However, if you are writing a larger number of blackboard artifacts in a loop, it is better to invoke
IngestServices.fireModuleDataEvent() only once after the bulk write, not to flood the system with events.

\subsubsection ingestmodule_making_results_inbox Posting Results to Message Inbox

Modules should post messages to the inbox when interesting data is found and has been posted to the blackboard.

A single message includes the module name, message subject, message details,
a unique message id (in the context of the originating module), and a uniqueness attribute.
The uniqueness attribute is used to group similar messages together
and to determine the overall importance priority of the message
(if the same message is seen repeatedly, it is considered lower priority).

For example, for a keyword search module, the uniqueness attribute would the keyword that was hit.

It is important though to not fill up the inbox with messages.  Do not post an inbox message for every
result artifact written to the blackboard.

These messages should only be sent if the result has a low false positive rate and will likely be relevant.
For example, the hash lookup module will send messages if known bad (notable) files are found,
but not if known good (NSRL) files are found.
The keyword search module will send messages if a specific keyword matches,
but will not send messages (by default) if a regular expression match for a URL has matches
(because a lot of the URL hits will be false positives and can generate thousands of messages on a typical system).


Ingest messages have different types:
there are info messages, warning messages, error messages and data messages.

Modules will mostly post data messages about the most relevant results.
The data messages contain encapsulated blackboard artifacts and attributes.
The passed in data is used by the ingest inbox GUI widget to navigate
to the artifact view in the directory tree, if requested by the user.

Ingest message API is defined in IngestMessage class.
The class also contains factory methods to create new messages.
Messages are posted using IngestServices.postMessage() method,
which accepts a message object created using one of the factory methods.

Modules should post info messages (with no data) to the inbox
at minimum when stop() or complete() is invoked (refer to the examples).
It is recommended to populate the description field of the complete inbox message to provide feedback to the user
summarizing the module ingest run and if any errors were encountered.

Modules should also post high-level error messages (e.g. without flooding the inbox
 when the same error encountered for a large number of files).


\subsection ingestmodule_making_configuration Module Configuration

<!-- @@@ -->
NOTE: Make sure we update this to reflect \ref mod_dev_properties and reduce duplicate comments.

Ingest modules may require user configuration. The framework
supports two levels of configuration: run-time and general. Run-time configuration
occurs when the user selects which ingest modules to run when an image is added.  This level
of configuration should allow the user to enable or disable settings.  General configuration is more in-depth and
may require an interface that is more powerful than simple check boxes.

As an example, the keyword search module uses both configuration methods.  The run-time configuration allows the user
to choose which lists of keywords to search for.  However, if the user wants to edit the lists or create lists, they
need to do go the general configuration window.

Module configuration is module-specific: every module maintains its own configuration state and is responsible for implementing the graphical interface. However, Autopsy does provide \ref mod_dev_configuration "a centralized location to display your settings to the user".

The run-time configuration (also called simple configuration), is achieved by each
ingest module providing a JPanel.   The IngestModuleAbstract.hasSimpleConfiguration(),
IngestModuleAbstract.getSimpleConfiguration(), and IngestModuleAbstract.saveSimpleConfiguration()
methods should be used for run-time configuration.

The general configuration is also achieved by the module returning a JPanel. A link will be provided to the general configuration from the ingest manager if it exists.
The IngestModuleAbstract.hasAdvancedConfiguration(),
IngestModuleAbstract.getAdvancedConfiguration(), and IngestModuleAbstract.saveAdvancedConfiguration()
methods should be used for general configuration.


\section ingestmodule_events Getting Ingest Status and Events

<!-- @@@ -->
NOTE: Sync this up with \ref mod_dev_events.

Other modules and core Autopsy classes may want to get the overall ingest status from the ingest manager.
The IngestManager handle is obtained using org.sleuthkit.autopsy.ingest.IngestManager.getDefault().
The manager provides access to ingest status with the
org.sleuthkit.autopsy.ingest.IngestManager.isIngestRunning() method and related methods
that allow to query ingest status per specific module.

External modules (such as data viewers) can also register themselves as ingest module event listeners
and receive event notifications (when a module is started, stopped, completed or has new data).
Use the IngestManager.addPropertyChangeListener() method to register a module event listener.
Events types received are defined in IngestManager.IngestModuleEvent enum.

At the end of the ingest, IngestManager itself will notify all listeners of IngestModuleEvent.COMPLETED event.
The event is an indication for listeners to perform the final data refresh by quering the blackboard.
Module developers are encouraged to generate periodic IngestModuleEvent.DATA
ModuleDataEvent events when they post data to the blackboard,
but the IngestManager will make a final event to handle scenarios where the module did not notify listeners while it was running.
*/