autopsy-flatpak/docs/doxygen/modIngest.dox

/*! \page mod_ingest_page Developing Ingest Modules


\section ingest_modules_getting_started Getting Started

This page describes how to develop ingest modules using either Java or Python (Jython).  It assumes you have
already set up your development environment as described in \ref mod_dev_page or \ref mod_dev_py_page.


\section ingest_module_types Ingest Module Types

Ingest modules analyze data from a data source (e.g., a disk image or a folder
of logical files). There are two types of ingest modules in Autopsy:

- Data-source-level ingest modules
- File-level ingest modules

The difference between these two types of modules is what gets passed in to them:
- Data source-level ingest modules get passed in a reference to a data source and it is up to the module to search for the files that it wants to analyze.
- File-level ingest modules are passed in a reference to each file, one at a time, and it analyzes the file that gets passed in.

Here are some guidelines for choosing the type of your ingest module:

- Your module should be a data-source-level ingest module only if the following are true:
 - It needs to retrieve and analyze only a small number of files.
 - It can find the files by name or other metadata in the database.
 - It does not depend on results from other file-level ingest modules.
 - It does not need access to the contents of ZIP and other archive files.
For example, our Windows registry analysis is done as a data source-level ingest module because it can query for the registry hives and process the small number of files that are found.
- If your needs are not met by a data source-level ingest module, then it should be a file-level ingest module.

As you will learn a little later in this guide, it is possible to make an ingest module that has both file-level and data-source level capabilities. You
would do this when you need to work at both levels to get all of your analysis
done. The modules in such a pair will be enabled or disabled together and will
have common per ingest job and global settings.


The text below will refer to example code in the org.sleuthkit.autopsy.examples package.
The sample modules don't do anything
particularly useful, but they can serve as templates for developing your own
ingest modules.


\section ingest_modules_lifecycle Ingest Module Life Cycle

Before we dive into the details of creating a module, it is important to understand the life cycle of the module. Note that this life cycle is much different for Autopsy 3.1 and newer modules compared to Autopsy 3.0 modules. This section only talks about 3.1 and newer modules.

You will need to implement at least two classes to make an ingest module:
-# A factory class that will be created when Autopsy starts and will provide configuration panels to Autopsy and will create instances of your ingest module.
-# An ingest module class that will be instantiated by the factory when the ingest modules are run.  A new instance of this class will be created for each ingest thread.

Here is an example sequence of events.  Details will be provided below.
-# User launches Autopsy and it looks for classes that implement the org.sleuthkit.autopsy.ingest.IngestModuleFactory interface.
-# Autopsy finds and creates an instance of your FooIngestModuleFactory class.
-# User adds a disk image.
-# Autopsy presents the list of available ingest modules to the user and uses the utility methods from FooIngestModuleFactory class to get the module's name, description, and configuration panels.
-# User enables your module (and others).
-# Autopsy uses FooIngestModuleFactory to create two instances of FooIngestModule (Autopsy is using two threads to process the files).
-# Autopsy calls FooIngestModule.startUp() on each thread and then calls FooIngestModule.process() to pass in each file.


\section ingest_modules_implementing_ingestmodulefactory_basic Creating a Basic Ingest Module

\subsection ingest_modules_implementing_basic_factory Basic Ingest Module Factory

The first step to write an ingest module is to make its factory.  There are three general types of things that a factory does:
-# Provides basic information such as the module's name, version, and description. (required)
-# Creates ingest modules. (required)
-# Provides panels so that the user can configure the module. (optional)

This section covers the required parts of a basic factory so that we can make the ingest module.  A later section (\ref ingest_modules_making_options) covers how you can use the factory to provide options to the user.

To make writing a simple factory easier, Autopsy provides an adapter class that implements the "optional" methods in the interface.  Our basic factory will use the adapter.

-# Make a factory class by either:
 - Copy and pasting the sample code from org.sleuthkit.autopsy.examples.SampleIngestModuleFactory (Java) or org.sleuthkit.autopsy.examples.ingestmodule.py.
 - Manually define a new class that extends (Java) or inherits (Jython) org.sleuthkit.autopsy.ingest.IngestModuleFactoryAdapter. If you are using Java, NetBeans will likely complain that you have not implemented the necessary methods and you can use its "hints" to automatically generate stubs for them.
-# Update and create the needed methods using the org.sleuthkit.autopsy.ingest.IngestModuleFactory interface documentation.  You can also use the sample code mentioned above for examples of what the methods should do if you did not already copy and paste them.
-# If you are using Java, import org.openide.util.lookup.ServiceProvider and add a dependency on the NetBeans Lookup
API module to the NetBeans module that contains your ingest module. Then add a NetBeans ServiceProvider annotation so that the factory is found at run time:
\code
@ServiceProvider(service = IngestModuleFactory.class)
\endcode

At this point, when you add a data source to an Autopsy case, you should see the module in the list of ingest modules.  If you don't see it, double check that you either implemented org.sleuthkit.autopsy.ingest.IngestModuleFactory
or extended or inherited org.sleuthkit.autopsy.ingest.IngestModuleFactoryAdapter. If using Java, make sure that you added the service provider annotation.


\subsection ingest_modules_implementing_ingestmodule Common Concepts of Ingest Modules

Before we cover the specific interfaces of the two different types of modules, let's talk about some common things.
- They both have a startUp() method and the expectation is that if there is any error in setting up the module, then they will throw an exception from this method so that ingest can be immediately stopped before analysis begins and the user can fix the problem.
- Autopsy will call a module from only a single thread.  It may make several threads, but it will make an instance of the module for each thread. You do not need to worry about thread safety unless you are using static variables or other kinds of variables that would be shared among multiple instances of the module.


\subsection ingest_modules_implementing_datasourceingestmodule Creating a Data Source-level Ingest Module

To create a data source ingest module:
-# Create the ingest module class by either:
 - Copy and paste the sample modules from:
  - Java: Core/src/org/sleuthkit/autopsy/examples/SampleDataSourceIngestModule.java
  - Python: pythonExamples/dataSourceIngestModule.py (https://github.com/sleuthkit/autopsy/tree/develop/pythonExamples)
 - Or the manual approach is to define a new class that implements (Java) or inherits (Jython) org.sleuthkit.autopsy.ingest.DataSourceIngestModule.  If you are using Java, the NetBeans IDE
will complain that you have not implemented one or more of the required methods.
You can use its "hints" to automatically generate stubs for the missing methods.
-# Configure your factory class to create instances of the new ingest module class. To do this, you will need to change the isDataSourceIngestModuleFactory() method to return true and have the createDataSourceIngestModule() method return a new instance of your ingest module. Both of these methods have default "no-op" implementations in the IngestModuleFactoryAdapter that we used.  Your factory should have code similar to this Java code:

\code
    @Override
    public boolean isDataSourceIngestModuleFactory() {
        return true;
    }

    @Override
    public DataSourceIngestModule createDataSourceIngestModule(IngestModuleIngestJobSettings ingestOptions) {
        return new FooDataSourceIngestModule();  // replace this class name with the name of your class
    }
\endcode
-# Use this page, the sample, and the documentation for the
org.sleuthkit.autopsy.ingest.DataSourceIngestModule interfaces to implement the startUp() and process() methods.
 - org.sleuthkit.autopsy.ingest.DataSourceIngestModule.startUp() is where any initialiation occurs.  If your module has a critical failure and will not be able to run, your startUp method should throw an IngestModuleException to stop ingest.
 - org.sleuthkit.autopsy.ingest.DataSourceIngestModule.process() is where all of the work of a data source ingest module is
done. It will be called exactly once. The
process() method receives a reference to an org.sleuthkit.datamodel.Content object
and an org.sleuthkit.autopsy.ingest.DataSourceIngestModuleProgress object.
The former is a representation of the data source. The latter should be used
by the module instance to report progress as it does its
potentially long-running processing. To be a good citizen within Autopsy, the module should also periodically call
org.sleuthkit.autopsy.ingest.IngestJobContext.isJobCancelled() and break off processing if the ingest is canceled.

Note that data source ingest modules must find the files that they want to analyze.
The best way to do that is using one of the findFiles() methods of the
org.sleuthkit.autopsy.casemodule.services.FileManager class.  See
\ref mod_dev_other_services for more details.


\subsection ingest_modules_implementing_fileingestmodule Creating a File Ingest Module

To create a file ingest module:
To create a data source ingest module:
-# Create the ingest module class by either:
 - Copy and paste the sample modules from:
  - Java: Core/src/org/sleuthkit/autopsy/examples/SampleFileIngestModule.java
  - Python: pythonExamples/fileIngestModule.py (https://github.com/sleuthkit/autopsy/tree/develop/pythonExamples)
 - Or the manual approach is to define a new class that implements (Java) or inherits (Jython) org.sleuthkit.autopsy.ingest.FileIngestModule.  If you are using Java, the NetBeans IDE
will complain that you have not implemented one or more of the required methods.
You can use its "hints" to automatically generate stubs for the missing methods.
-# Configure your factory class to create instances of the new ingest module class. To do this, you will need to change the isFileIngestModuleFactory() method to return true and have the createFileIngestModule() method return a new instance of your ingest module. Both of these methods have default "no-op" implementations in the IngestModuleFactoryAdapter that we used.  Your factory should have code similar to this Java code:

\code
    @Override
    public boolean isFileIngestModuleFactory() {
        return true;
    }

    @Override
    public FileIngestModule createFileIngestModule(IngestModuleIngestJobSettings ingestOptions) {
        return new FooFileIngestModule(); // replace this class name with the name of your class
    }
\endcode

-# Use this page, the sample, and the documentation for the
org.sleuthkit.autopsy.ingest.FileIngestModule interface to implement the startUp(), and process(), and shutDown() methods.
 - org.sleuthkit.autopsy.ingest.FileIngestModule.startUp() should have any code that you need to initialize your module.  If you have any startup errors, be sure to throw a IngestModuleException exception to stop ingest.
 - org.sleuthkit.autopsy.ingest.FileIngestModule.process() is where all of the work of a file ingest module is
done. It will be called repeatedly between startUp() and shutDown(), once for
each file Autopsy feeds into the pipeline of which the module instance is a part. The
process() method receives a reference to a org.sleuthkit.datamodel.AbstractFile
object.


\subsection ingest_modules_implementing_next Next Steps

This section gave you the outline of making the module.  Now we'll cover what you can do in the module. The following sections
often make use of the org.sleuthkit.autopsy.ingest.IngestServices class, which provides many convenient services to make module writing easier.
Also make sure you refer to \ref mod_dev_other_services if you are looking for a feature.


\section ingest_modules_making_results Doing Something With Your Results
The previous section outlined how to make the basic module and how to get access to the data.  The next
step is to then do some fancy analysis and present the results to the user.

The first question that you must answer is what type of data do you want the user to see.  There are two options:

-# Data that can be accessed from the tree on the left-hand side of the UI and can be displayed in a table.  To do this, you will make blackboard artifacts.
-# Data that is in a big text file or some other report that the user can review. To do this, you will use the Case.addReport() method to make the output available in the directory tree.


\subsection ingest_modules_making_results_bb Posting Results to the Blackboard
The blackboard is used to store results so that they are displayed in the results tree.
See \ref platform_blackboard  for details on posting results to it.  You use the blackboard when you have specific items to show the user.  if you want to just shown them a big report from another library or tool, see \ref mod_report_page.
The blackboard defines artifacts for specific data types (such as web bookmarks).
You can use one of the standard artifact types or create your own.

After you make an artifact, you should call <a href="http://sleuthkit.org/sleuthkit/docs/jni-docs/4.6/classorg_1_1sleuthkit_1_1datamodel_1_1_blackboard.html">sleuthkit.Blackboard.postArtifact()</a>, which will:
<ul>
<li>Analyze the artifact and add any timestamps to the Timeline tables
<li>Send an event over the EventBus that the artifact(s) was added
<ul>
<li>Autopsy is a listener of this EventBus and will rebroadcast it to other Autopsy modules
<li>Keyword search will also listen for this and will index the artifact
</ul>
</ul>

This means you no longer have to make calls to:
 - Index the artifact
 - Fire the event to refresh the UI.

If you are creating a large number of artifacts, you may see better performance if you save all the artifacts and do one bulk write at the end using <a href="http://sleuthkit.org/sleuthkit/docs/jni-docs/4.6/classorg_1_1sleuthkit_1_1datamodel_1_1_blackboard.html">sleuthkit.Blackboard.postArtifacts()</a>

You should not be using the Autopsy version of Blackboard. Those methods have all been deprecated and is another example of us moving "services" into the TSK data model.


\subsection ingest_modules_making_results_report Making a Report

If your module makes a text or HTML file (or some other report format) that has some complex structure (either because you prefer to write output that way or your module is simply a wrapper around another tool), then you can simply call the output a report and then it will be shown in the UI in the reports area.  You can do this by calling the org.sleuthkit.autopsy.casemodule.Case.addReport() method.


\section ingest_modules_users Getting User Attention

The ingest modules are running in the background and the user will not notice everything you put in the tree.


\subsection ingest_modules_making_results_inbox Posting Results to the Message Inbox

Modules should post messages to the inbox when interesting data is found.
Of course, such data should also be posted to the blackboard as described above.  The idea behind
the ingest messages is that they are presented in chronological order so that
users can see what was found while they were focusing on something else.

Inbox messages should only be sent if the result has a low false positive rate
and will likely be relevant.  For example, the core Autopsy hash lookup module
sends messages if known bad (notable) files are found, but not if known good
(NSRL) files are found. This module also provides a global setting
(using its global settings panel) that allows a user to turn these messages on
or off.

Messages are created using the org.sleuthkit.autopsy.ingest.IngestMessage class
and posted to the inbox using the org.sleuthkit.autopsy.ingest.IngestServices.postMessage()
method.


\subsection ingest_modules_making_results_error Reporting Errors

When an error occurs, you should write an error message to the Autopsy logs, using a
logger obtained from org.sleuthkit.autopsy.ingest.IngestServices.getLogger().

You could also send an error message to the ingest inbox. The
downside of this is that the ingest inbox was not really designed for this
purpose and it is easy for the user to miss these messages.  Therefore, it is
preferable to post a pop-up message that is displayed in the lower right hand
corner of the main window by calling
org.sleuthkit.autopsy.coreutils.MessageNotifyUtil.Notify.show().


\section ingest_modules_making_options User Options and Configuration

Autopsy allows a module to provide two levels of configuration:
- When an ingest job is being configured, the user can choose settings that are unique to that ingest job / pipeline.  For example, to enable a certain hash set.
- The user can configure global settings that apply to all jobs. For example, to add or delete a hash set.

To provide either or both of these options to the user, we need to implement methods defined in the IngestModuleFactory interface.  You can either add them to your class that extends the IngestModuleFactoryAdapter or decide to simply implement the interface.

You can also refer to sample implementations of the interfaces and abstract
classes in the org.sleuthkit.autopsy.examples package, although you should note
that the samples do not do anything particularly useful.


\subsection ingest_modules_making_options_ingest Ingest Job Options

Autopsy allows you to provide a graphical panel that will be displayed when the user decides to enable the ingest module.  This panel is supposed to be for settings that the user may turn on or off for different data sources.


To provide options for each ingest job:
- Update org.sleuthkit.autopsy.ingest.IngestModuleFactory.hasIngestJobSettingsPanel() in your factory class to return true.
- Update org.sleuthkit.autopsy.ingest.IngestModuleFactory.getIngestJobSettingsPanel() in your factory class to return a IngestModuleIngestJobSettingsPanel that displays the needed configuration options. The recommended size for this panel is 300 x 300 pixels. The org.sleuthkit.autopsy.ingest.IngestModuleIngestJobSettingsPanel.getSettings() method should return an instance of a org.sleutkit.autopsy.ingest.IngestModuleIngestJobSettings object based on the user-specified settings (see next bullet).
- Create a class that implements org.sleutkit.autopsy.ingest.IngestModuleIngestJobSettings. Your IngestModuleIngestJobSettingsPanel should store settings in here.  This class needs to be Serializable, so keep all data types simple or mark them as transient with some custom deserialization code. You should also set the serialVersionUID (see http://stackoverflow.com/questions/285793/what-is-a-serialversionuid-and-why-should-i-use-it).
- If you decide to store settings internal to the module (NOT RECOMMENDED), the getSettings() method can return an instance of NoIngestModuleIngestJobSettings.
- Your instance of IngestModuleIngestJobSettings will be saved and passed to your panel the next time so that you can pre-populate it accordingly.

Your panel should create the IngestModuleIngestJobSettings class to store the settings and that will be passed back into your factory with each call to createDataSourceIngestModule() or createFileIngestModule().  The way that we have implemented this in Autopsy modules is that the factory casts the IngestModuleINgestJobSettings object to the module-specific implementation and then passes it into the constructor of the ingest module.  The ingest module can then call whatever getter methods that were defined based on the panel settings.


You can also implement the getDefaultIngestJobSettings() method to return an instance of your IngestModuleIngestJobSettings class with default settings.  Autopsy will call this when the module has not been run before.

NOTE: We recommend storing simple data in the IngestModuleIngestJobSettings-based class.  In the case of our hash lookup module, we store the string names of the hash databases to do lookups in.  We then get the hash database handles in the call to startUp() using the global module settings.


NOTE: The main benefit of using the IngestModuleIngestJobSettings-based class to store settings (versus some static variables in your package) are:
- When multiple jobs are running at the same time, each can have their own settings.
- Autopsy persists them so that the last used settings get passed into the call to getIngestJobSettingsPanel() and you don't need to save them yoursevles to provide the user the benefit of re-using the last settings.


\subsection ingest_modules_making_options_global Global Options

Global options are those that are not specific to a data source or ingest pipeline.  They are big-picture settings.

To provide global options:
- Update org.sleuthkit.autopsy.ingest.IngestModuleFactory.hasGlobalSettingsPanel() in your factory class to return true.
- Update org.sleuthkit.autopsy.ingest.IngestModuleFactory.getGlobalSettingsPanel() to return a org.sleuthkit.autopsy.ingest.IngestModuleGlobalSetttingsPanel with widgets to support the global settings.
- You are responsible for persisting global settings and may use the module
settings methods provided by org.sleuthkit.autopsy.ingest.IngestServices for
saving simple properties, or the facilities of classes such as
org.sleuthkit.autopsy.coreutils.PlatformUtil and org.sleuthkit.autopsy.coreutils.XMLUtil
for more sophisticated approaches.
- You are responsible for providing a way for the ingest module to obtain the global settings. For
example, the Autopsy core hash look up module comes with a singleton hash databases
manager. Users import and create hash databases using the global settings panel.
Then they select which hash databases to use for a particular job using the
ingest job settings panel. When a module instance runs, it gets the relevant
databases from the hash databases manager.
- You are responsible for having the ingest job options panel update itself if the global settings change (i.e. if a new item is added that must be listed on the ingest panel).


\section ingest_modules_api_migration Migrating 3.0 Java Ingest Modules to the 3.1 and newer API

This section is a guide for module developers who wrote modules for the 3.0 API.  These API changes occurred so that
we could make parallel pipelines of the file-level ingest modules.   This section assumes you've read the above description of the new API.

There are three big changes to make in your module:
-# Modules are no longer singletons. Autopsy will make one of your factory classes and many instances of the ingest modules.  As part of the migration to the new classes, your singleton infrastructure will disappear.
-# You'll need to move the UI/Configuration methods into the factory class and the ingest module methods into their own class.  You'll also need to update the APIs for the methods a bit.
-# You'll need to review your ingest module code for thread safety if you are using any static member variables.


We recommend that you:
-# Create a new factory class and move over the UI panels, configuration code, and standard methods (name, description, version, etc.). You'll probably want the name in the ingest module code, so you should also store the name in a package-wide static member variable.
-# Get the factory to compile and work. You can do basic testing by running Autopsy and verifying that you see your module and its panels.
-# Change your old ingest module to implement the new interface and adjust it (see the name changes below). Then update the factory to create it.
-# Review the ingest module code for thread safety (especially look for static member variables).


The following table provides a mapping of the methods of the old abstract classes to
the new interfaces:

Old method | New Method |
---------- | ---------- |
IngestModuleAbstract.getType() | N/A |
IngestModuleAbstract.init() | IngestModule.startUp() |
IngestModuleAbstract.getName() | IngestModuleFactory.getModuleName() |
IngestModuleAbstract.getDescription() | IngestModuleFactory.getModuleDescription() |
IngestModuleAbstract.getVersion() | IngestModuleFactory.getModuleVersion() |
IngestModuleAbstract.hasBackgroundJobsRunning | N/A |
IngestModuleAbstract.complete() | IngestModule.shutDown() for file ingest modules; data source ingest modules should do anything they did in complete() at the end of the process() method |
IngestModuleAbstract.hasAdvancedConfiguration() | IngestModuleFactory.hasGlobalSettingsPanel() |
IngestModuleAbstract.getAdvancedConfiguration() | IngestModuleFactory.getGlobalSettingsPanel() |
IngestModuleAbstract.saveAdvancedConfiguration() | IngestModuleGlobalSetttingsPanel.saveSettings() |
N/A | IngestModuleFactory.getDefaultIngestJobSettings() |
IngestModuleAbstract.hasSimpleConfiguration() | IngestModuleFactory.hasIngestJobSettingsPanel() |
IngestModuleAbstract.getSimpleConfiguration() | IngestModuleFactory.getIngestJobSettingsPanel() |
IngestModuleAbstract.saveSimpleConfiguration() | N/A |
N/A | IngestModuleIngestJobSettingsPanel.getSettings()  |
N/A | IngestModuleFactory.isDataSourceIngestModuleFactory() |
N/A | IngestModuleFactory.createDataSourceIngestModule() |
N/A | IngestModuleFactory.isFileIngestModuleFactory() |
N/A | IngestModuleFactory.createFileIngestModule() |

Notes:
- IngestModuleFactory.getModuleName() should delegate to a static class method
that can also be called by ingest module instances.
- The IngestJobContext object passed to startUp() can be queried to determine whether the ingest
job completed or was cancelled.
- The global settings panel (formerly "advanced") for a module must implement
IngestModuleGlobalSettingsPanel which extends JPanel. Global settings are those
that affect all modules, regardless of ingest job and pipeline.
- The per ingest job settings panel (formerly "simple") for a module must implement
IngestModuleIngestJobSettingsPanel which extends JPanel. It takes the settings
for the current context as a serializable IngestModuleIngestJobSettings object
and its getSettings() methods returns a serializable IngestModuleIngestJobSettings object.
The IngestModuleIngestJobSettingsPanel.getSettings() method replaces the saveSimpleSettings() method,
except that now Autopsy persists the settings in a context-sensitive fashion.
- The IngestModuleFactory creation methods replace the getInstance() methods of
the former singletons and receive a IngestModuleIngestJobSettings object that should be
passed to the constructors of the module instances the factory creates.

*/