autopsy-flatpak/docs/doxygen/modIngest.dox

/*! \page mod_ingest_page Developing Ingest Modules


\section ingest_modules_getting_started Getting Started

This page describes how to develop ingest modules using either Java or Jython.  It assumes you have
already set up your Java development environment as described in \ref mod_dev_page.  Note that the Jython JAR file
has been added to the Autopsy-Core NetBeans project as a wrapped JAR.

Ingest modules developed using Java can be added directly to your private copy of the source code, but if you
want to share your module with others, it should be developed as a NetBeans module. You can then distribute the module
and users can install it in Autopsy using the NetBeans Plugin Manager accessed through Autopsy's Tools -> Plugins menu item.
The user will go to the Downloaded tab, push the Add Plugins... button to navigate to and select your NetBeans Module file,
and then push the Install button to complete the installation.

Ingest modules developed using Jython are installed in Autopsy by placing them in their own subdirectory of the python_modules
directory. A window into the python_modules directory can be opened through the Autopsy's Tools -> Python Plugins menu item.
Create a folder in this directory and create or place your Python scripts in this folder.

\section ingest_module_types Ingest Module Types

Ingest modules analyze data from a data source (e.g., a disk image or a folder
of logical files). There are two types of ingest modules in Autopsy:

- Data-source-level ingest modules
- File-level ingest modules

The difference between these two types of modules is what gets passed in to their process() methods during a data source ingest.
The process() method of a data-source-level module is called once per ingest is passed a reference to the data source.
The process() method of a file-level ingest module is called for each file in the data source and is passed in a reference to the current file. Here are some guidelines for choosing the type of your ingest module:

- Your module should be a data-source-level ingest module if it only needs to
retrieve and analyze a small subset of the files present in a data source and it can find those files based on data in the database (such as file names).
For example, a Windows registry analysis module that only processes
registry hive files should be implemented as a data-source-level ingest module because there are only a few registry hives and you can find them by name.
- Your module should be a file-level ingest module if it analyzes most or all of
the files from a data source, one file at a time. If you cannot rely on finding a file based on its name, it will need to be a file-level ingest module. For example, a hash look up
module might process every file system file by looking up its hash in one or
more known files and known bad files hash sets (hash databases).

As you will learn a little later in this guide, it is possible to package a
data-source-level ingest module and a file-level ingest module together. You
would do this when you need to work at both levels to get all of your analysis
done. The modules in such a pair will be enabled or disabled together and will
have common per ingest job and global settings.

The text below will refer to example code in the org.sleuthkit.autopsy.examples package.
The sample modules don't do anything
particularly useful, but they can serve as templates for developing your own
ingest modules.


\section ingest_modules_lifecycle Ingest Module Life Cycle

Before we dive into the details of creating a module, it is important to understand the life cycle of the module. Note that this life cycle is much different for Autopsy 3.1 modules compared to Autopsy 3.0 modules. This section only talks about 3.1 modules.

You will need to implement at least two classes to make an ingest module:
-# A factory class that will be created when Autopsy starts and will provide configuration panels to Autopsy and will create instances of your ingest module.
-# An ingest module class that will be instantiated by the factory when the ingest modules are run.  A new instance of this class will be created for each ingest thread.

Here is an example sequence of events.  Details will be provided below.
-# User launches Autopsy and it looks for classes that implement the org.sleuthkit.autopsy.ingest.IngestModuleFactory interface.
-# Autopsy finds and creates an instance of your FooIngestModuleFactory class.
-# User adds a disk image.
-# Autopsy presents the list of available ingest modules to the user and uses the utility methods from FooIngestModuleFactory class to get the module's name, description, and configuration panels.
-# User enables your module (and others).
-# Autopsy uses FooIngestModuleFactory to create two instances of FooIngestModule (Autopsy is using two threads to process the files).
-# Autopsy starts up the module and calls its process method.


\section ingest_modules_implementing_ingestmodulefactory_basic Creating a Basic Ingest Module

\subsection ingest_modules_implementing_basic_factory Basic Ingest Module Factory

The first step to write an ingest module is to make its factory.  There are three general types of things that a factory does:
-# Provides  basic information such as the module's name, version, and description. (required)
-# Creates ingest modules. (required)
-# Provides panels so that the user can configure the module. (optional)

This section covers the required parts of a basic factory so that we can make the ingest module.  A later section (\ref ingest_modules_making_options)
covers how you can use the factory to provide options to the user.

To make writing a simple factory easier, Autopsy provides an adapter class that implements the "optional" methods in the interface.
Our basic factory will use the adapter.

-# Create a class either manually or using the NetBeans wizards. Edit the class to extend org.sleuthkit.autopsy.ingest.IngestModuleFactoryAdapter. NetBeans will likely complain that you have not implemented the necessary methods and you can use its "hints" to automatically generate stubs for them.
-# Use the documentation for the org.sleuthkit.autopsy.ingest.IngestModuleFactory interface for details on what each method needs to do. You can also refer to org.sleuthkit.autopsy.examples.SampleIngestModuleFactory as an example.
-# Add a NetBeans ServiceProvider annotation so that the factory is found at run time:
\code
@ServiceProvider(service = IngestModuleFactory.class)
\endcode

You will also need to import org.openide.util.lookup.ServiceProvider and add a dependency on the NetBeans Lookup
API module to the NetBeans module that contains your ingest module.

At this point, you should be able to compile your NetBeans module and run it. When you add a data source,
you should see the module in the list of ingest modules.  If you don't see it, double check that you either implemented org.sleuthkit.autopsy.ingest.IngestModuleFactory
or extended org.sleuthkit.autopsy.ingest.IngestModuleFactoryAdapter and that you added the service provider annotation.


\subsection ingest_modules_implementing_ingestmodule Understanding the IngestModule Interface

Data source and file ingest modules have similar APIs. The main difference is what data gets passed
to the methods.  Let's first cover the common concepts.

Both modules implement the org.sleuthkit.autopsy.ingest.IngestModule interface, which defines a module initialization method:
- org.sleuthkit.autopsy.ingest.IngestModule.startUp()

Use the previous links to get the details of this method. The ingest modules have to implement a process()
method that will get passed in either a data source or a file. A file ingest module will also have to implement a shutDown() method.

- startUp() will be called before any data is analyzed to initialize and allocate resources. Any setup procedures that could fail should be done in startUp() so that it can throw an exception and cause the ingest job to stop and notify the user.
- shutDown() will be called on file ingest modules after all of the files have been analyzed.
- startUp(), process(), and shutDown() will be called from a single thread.  So, basic modules do not need to worry about thread safety if they allocate resources for each instance of the module. However, if the module wants to share resources between instances, then it is responsible for  synchronizing access to the shared resources.  See org.sleuthkit.autopsy.examples.SampleFileIngestModule as an example that shares resources.

The org.sleuthkit.autopsy.ingest.DataSourceIngestModule and org.sleuthkit.autopsy.ingest.FileIngestModule
interfaces both extend org.sleuthkit.autopsy.ingest.IngestModule.

\subsection ingest_modules_implementing_datasourceingestmodule Creating a Data Source Ingest Module

To create a data source ingest module:
-# Make a new Java class either manually or
using the NetBeans wizards.
-# Make the class implement
org.sleuthkit.autopsy.ingest.DataSourceIngestModule.
-# The NetBeans IDE
will complain that you have not implemented one or more of the required methods.
You can use its "hints" to automatically generate stubs for the missing methods.  Use this page and the
documentation for the org.sleuthkit.autopsy.ingest.IngestModule and
org.sleuthkit.autopsy.ingest.DataSourceIngestModule interfaces for guidance on
what each method needs to do.  Or you can copy the code from
org.sleuthkit.autopsy.examples.SampleDataSourceIngestModule and use it as a
template for your module.

All data source ingest modules must implement the single method defined by the
org.sleuthkit.autopsy.ingest.DataSourceIngestModule interface:

- org.sleuthkit.autopsy.ingest.DataSourceIngestModule.process()

The process() method is where all of the work of a data source ingest module is
done. It will be called exactly once. The
process() method receives a reference to an org.sleuthkit.datamodel.Content object
and an org.sleuthkit.autopsy.ingest.DataSourceIngestModuleProgress object.
The former is a representation of the data source. The latter should be used
by the module instance to report progress as it does its
potentially long-running processing. To be a good citizen within Autopsy, the module should also periodically call
org.sleuthkit.autopsy.ingest.IngestJobContext.isJobCancelled() and break off processing if the ingest is canceled.

Note that data source ingest modules must find the files that they want to analyze.
The best way to do that is using one of the findFiles() methods of the
org.sleuthkit.autopsy.casemodule.services.FileManager class.  See
\ref mod_dev_other_services for more details.

The final step to getting the basic ingest module working is to configure your factory class to create instances of it. To do this, you will need to change the isDataSourceIngestModuleFactory() method to return true and have the createDataSourceIngestModule() method return a new instance of your ingest module. Both of these methods have default "no-op" implementations in the IngestModuleFactoryAdapter that we used.  Your factory should have code similar to:

\code
    @Override
    public boolean isDataSourceIngestModuleFactory() {
        return true;
    }

    @Override
    public DataSourceIngestModule createDataSourceIngestModule(IngestModuleIngestJobSettings ingestOptions) {
        return new FooDataSourceIngestModule();  // replace this class name with the name of your class
    }
\endcode


\subsection ingest_modules_implementing_fileingestmodule Creating a File Ingest Module

To create a file ingest module:
-# Make a new Java class either manually or
using the NetBeans wizards.
-# Make the class implement
org.sleuthkit.autopsy.ingest.FileIngestModule.
-# The NetBeans IDE
will complain that you have not implemented one or more of the required methods.
You can use its "hints" to automatically generate stubs for the missing methods.  Use this page and the
documentation for the org.sleuthkit.autopsy.ingest.IngestModule and
org.sleuthkit.autopsy.ingest.FileIngestModule interfaces for guidance on what
each method needs to do.  Or you can copy the code from
org.sleuthkit.autopsy.examples.SampleFileIngestModule and use it as a
template for your module.

All file ingest modules must implement the two methods defined by the
org.sleuthkit.autopsy.ingest.FileIngestModule interface:

- org.sleuthkit.autopsy.ingest.FileIngestModule.process()
- org.sleuthkit.autopsy.ingest.FileIngestModule.shutDown()

The process() method is where all of the work of a file ingest module is
done. It will be called repeatedly between startUp() and shutDown(), once for
each file Autopsy feeds into the pipeline of which the module instance is a part. The
process() method receives a reference to a org.sleuthkit.datamodel.AbstractFile
object.

The final step to getting the basic ingest module working is to configure your factory class to create instances of it. To do this, you will need to change the isFileIngestModuleFactory() method to return true and have the createFileIngestModule() method return a new instance of your ingest module. Both of these methods have default "no-op" implementations in the IngestModuleFactoryAdapter that we used.  Your factory should have code similar to:

\code
    @Override
    public boolean isFileIngestModuleFactory() {
        return true;
    }

    @Override
    public FileIngestModule createFileIngestModule(IngestModuleIngestJobSettings ingestOptions) {
        return new FooFileIngestModule(); // replace this class name with the name of your class
    }
\endcode


\section ingest_modules_services Platform Services

The previous section will allow you to get a module up and running that will be passed in either a file or a data source to analyze.
This section covers how you get access to more data and how you can display data to the user.


\subsection ingest_modules_services_ingest Ingest Services

The singleton instance of the org.sleuthkit.autopsy.ingest.IngestServices class
provides services tailored to the needs of ingest modules, and a module developer
should use these utilities to log errors, send messages, get the current case,
fire events, persist simple global settings, etc.  Refer to the documentation
of the IngestServices class for method details.

\subsection ingest_modules_making_results Giving the User Feedback

Ingest modules run in the background.  There are three ways to send messages and
save results so that the user can see them:

- Use the blackboard for long-term storage of analysis results. These results
will be displayed in the results tree.
- Use the ingest messages inbox to notify users of high-value analysis results
that were also posted to the blackboard.
- Use the logging and/or message box utilities for error messages.


\subsection ingest_modules_making_results_bb Posting Results to the Blackboard
The blackboard is used to store results so that they are displayed in the results tree.
See \ref platform_blackboard  for details on posting results to it.

The blackboard defines artifacts for specific data types (such as web bookmarks).
You can use one of the standard artifact types, create your own, or simply post text
as a org.sleuthkit.datamodel.BlackboardArtifact.ARTIFACT_TYPE.TSK_TOOL_OUTPUT artifact.
The latter is much easier (for example, you can simply copy in the output from
an existing tool), but it forces the user to parse the output themselves.

When modules add data to the blackboard, they should notify listeners of the new
data by invoking the org.sleuthkit.autopsy.ingest.IngestServices.fireModuleDataEvent() method.
Do so as soon as you have added an artifact to the blackboard.
This allows other modules (and the main UI) to know when to query the blackboard
for the latest data.  However, if you are writing a large number of blackboard
artifacts in a loop, it is better to invoke org.sleuthkit.autopsy.ingest.IngestServices.fireModuleDataEvent()
only once after the bulk write, so as not to flood the system with events.


\subsection ingest_modules_making_results_inbox Posting Results to the Message Inbox

Modules should post messages to the inbox when interesting data is found.
Of course, such data should also be posted to the blackboard as described above.  The idea behind
the ingest messages is that they are presented in chronological order so that
users can see what was found while they were focusing on something else.

Inbox messages should only be sent if the result has a low false positive rate
and will likely be relevant.  For example, the core Autopsy hash lookup module
sends messages if known bad (notable) files are found, but not if known good
(NSRL) files are found. This module also provides a global setting
(using its global settings panel) that allows a user to turn these messages on
or off.

Messages are created using the org.sleuthkit.autopsy.ingest.IngestMessage class
and posted to the inbox using the org.sleuthkit.autopsy.ingest.IngestServices.postMessage()
method.


\subsection ingest_modules_making_results_error Reporting Errors

When an error occurs, you should write an error message to the Autopsy logs, using a
logger obtained from org.sleuthkit.autopsy.ingest.IngestServices.getLogger().
You could also send an error message to the ingest inbox. The
downside of this is that the ingest inbox was not really designed for this
purpose and it is easy for the user to miss these messages.  Therefore, it is
preferable to post a pop-up message that is displayed in the lower right hand
corner of the main window by calling
org.sleuthkit.autopsy.coreutils.MessageNotifyUtil.Notify.show().


\section ingest_modules_making_options User Options

Autopsy allows a module to provide two levels of configuration:
- When an ingest job is being configured, the user can choose settings that are unique to that ingest job / pipeline.  For example, to enable a certain hash set.
- The user can configure global settings that apply to all jobs. For example, to add or delete a hash set.

To provide either or both of these options to the user, we need to implement methods defined in the IngestModuleFactory interface.  You can either add them to your class that extends the IngestModuleFactoryAdapter or decide to simply implement the interface.

You can also refer to sample implementations of the interfaces and abstract
classes in the org.sleuthkit.autopsy.examples package, although you should note
that the samples do not do anything particularly useful.


\subsection ingest_modules_making_options_ingest Ingest Job Options

To provide options for each ingest job:
- hasIngestJobSettingsPanel() must return true.
- getIngestJobSettingsPanel() must return a IngestModuleIngestJobSettingsPanel that displays the needed configuration options and returns a IngestModuleIngestJobSettings object based on the settings.
- You are free to implement IngestModuleIngestJobSettings and store whatever you want in it (as long as it is serializable).
- The IngestModuleIngestJobSettings object that was created during configuration will be passed back to the factory with each call to createDataSourceIngestModule() or createFileIngestModule().  The factory should cast it to its internal class that implements IngestModuleIngestJobSettings and pass that object into the constructor of its ingest module so that it can use the settings when it runs.


You can also implement the getDefaultIngestJobSettings() method to return the default settings that Autopsy should use when the module has not been run before.

NOTE: We recommend storing simple data in the IngestModuleIngestJobSettings-based class.  In the case of our hash lookup module, we store the string names of the hash databases to do lookups in.  We then get the hash database handles in the call to startUp() using the global module settings.


\subsection ingest_modules_making_options_global Global Options

To provide global options:
- hasGlobalSettingsPanel() must return true.
- getGlobalSettingsPanel() must return a org.sleuthkit.autopsy.ingest.IngestModuleGlobalSetttingsPanel with widgets to support the global settings.
- You are responsible for persisting global settings and may use the module
settings methods provided by org.sleuthkit.autopsy.ingest.IngestServices for
saving simple properties, or the facilities of classes such as
org.sleuthkit.autopsy.coreutils.PlatformUtil and org.sleuthkit.autopsy.coreutils.XMLUtil
for more sophisticated approaches.
- You are responsible for providing a way for the ingest module to obtain the global settings. For
example, the Autopsy core hash look up module comes with a singleton hash databases
manager. Users import and create hash databases using the global settings panel.
Then they select which hash databases to use for a particular job using the
ingest job settings panel. When a module instance runs, it gets the relevant
databases from the hash databases manager.
- You are responsible for having the ingest job options panel update itself if the global settings change (i.e. if a new item is added that must be listed on the ingest panel).

\section ingest_modules_api_migration Migrating Ingest Modules to the Current API

This section is a guide for module developers who wrote modules for the 3.0 API.  These API changes occurred so that
we could make parallel pipelines of the file-level ingest modules.   This section assumes you've read the above description of the new API.

There are three big changes to make in your module:
-# Modules are no longer singletons. Autopsy will make one of your factory classes and many instances of the ingest modules.  As part of the migration to the new classes, your singleton infrastructure will disappear.
-# You'll need to move the UI/Configuration methods into the factory class and the ingest module methods into their own class.  You'll also need to update the APIs for the methods a bit.
-# You'll need to review your ingest module code for thread safety if you are using any static member variables.


We recommend that you:
-# Create a new factory class and move over the UI panels, configuration code, and standard methods (name, description, version, etc.). You'll probably want the name in the ingest module code, so you should also store the name in a package-wide static member variable.
-# Get the factory to compile and work. You can do basic testing by running Autopsy and verifying that you see your module and its panels.
-# Change your old ingest module to implement the new interface and adjust it (see the name changes below). Then update the factory to create it.
-# Review the ingest module code for thread safety (especially look for static member variables).


The following table provides a mapping of the methods of the old abstract classes to
the new interfaces:

Old method | New Method |
---------- | ---------- |
IngestModuleAbstract.getType() | N/A |
IngestModuleAbstract.init() | IngestModule.startUp() |
IngestModuleAbstract.getName() | IngestModuleFactory.getModuleName() |
IngestModuleAbstract.getDescription() | IngestModuleFactory.getModuleDescription() |
IngestModuleAbstract.getVersion() | IngestModuleFactory.getModuleVersion() |
IngestModuleAbstract.hasBackgroundJobsRunning | N/A |
IngestModuleAbstract.complete() | IngestModule.shutDown() for file ingest modules; data source ingest modules should do anything they did in complete() at the end of the process() method |
IngestModuleAbstract.hasAdvancedConfiguration() | IngestModuleFactory.hasGlobalSettingsPanel() |
IngestModuleAbstract.getAdvancedConfiguration() | IngestModuleFactory.getGlobalSettingsPanel() |
IngestModuleAbstract.saveAdvancedConfiguration() | IngestModuleGlobalSetttingsPanel.saveSettings() |
N/A | IngestModuleFactory.getDefaultIngestJobSettings() |
IngestModuleAbstract.hasSimpleConfiguration() | IngestModuleFactory.hasIngestJobSettingsPanel() |
IngestModuleAbstract.getSimpleConfiguration() | IngestModuleFactory.getIngestJobSettingsPanel() |
IngestModuleAbstract.saveSimpleConfiguration() | N/A |
N/A | IngestModuleIngestJobSettingsPanel.getSettings()  |
N/A | IngestModuleFactory.isDataSourceIngestModuleFactory() |
N/A | IngestModuleFactory.createDataSourceIngestModule() |
N/A | IngestModuleFactory.isFileIngestModuleFactory() |
N/A | IngestModuleFactory.createFileIngestModule() |

Notes:
- IngestModuleFactory.getModuleName() should delegate to a static class method
that can also be called by ingest module instances.
- The IngestJobContext object passed to startUp() can be queried to determine whether the ingest
job completed or was cancelled.
- The global settings panel (formerly "advanced") for a module must implement
IngestModuleGlobalSettingsPanel which extends JPanel. Global settings are those
that affect all modules, regardless of ingest job and pipeline.
- The per ingest job settings panel (formerly "simple") for a module must implement
IngestModuleIngestJobSettingsPanel which extends JPanel. It takes the settings
for the current context as a serializable IngestModuleIngestJobSettings object
and its getSettings() methods returns a serializable IngestModuleIngestJobSettings object.
The IngestModuleIngestJobSettingsPanel.getSettings() method replaces the saveSimpleSettings() method,
except that now Autopsy persists the settings in a context-sensitive fashion.
- The IngestModuleFactory creation methods replace the getInstance() methods of
the former singletons and receive a IngestModuleIngestJobSettings object that should be
passed to the constructors of the module instances the factory creates.

*/