Finished second and third tutorials.

This commit is contained in:
Ann Priestman 2019-10-18 11:24:36 -04:00
parent d9dea468f3
commit e4299efff1
6 changed files with 314 additions and 24 deletions

View File

@ -772,8 +772,9 @@ INPUT = main.dox \
regressionTesting.dox \
native_libs.dox \
modDevPython.dox \
modFileIngestTutorial.dox \
modDSIngestTutorial.dox \
modFileIngestTutorial.dox \
modDSIngestTutorial.dox \
modReportModuleTutorial.dox \
debugTsk.dox \
../../Core/src \
../../CoreLibs/src \

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -11,7 +11,7 @@ If these pages don't answer your question, then send the question to the <a href
If you want to write Java or Python modules, then there are some tutorials and detailed pages in this document. The Python tutorials include:
- File Ingest Modules: \subpage mod_python_file_ingest_tutorial_page
- Data Source Ingest Modules: \subpage mod_python_ds_ingest_tutorial_page
- Report Modules: http://www.basistech.com/python-autopsy-module-tutorial-3-the-report-module/
- Report Modules: \subpage mod_python_report_tutorial_page
This document contains the following pages:
- \subpage platform_page

View File

@ -1,5 +1,171 @@
/*! \page mod_python_ds_ingest_tutorial_page Python Tutorial #2: Writing a Data Source Ingest Module
In the \ref mod_python_file_ingest_tutorial_page "first tutorial" we built a basic Python Autopsy module that looked for big and round files. In this tutorial we're going to make two data source ingest modules. The first focuses on finding SQLite databases and parsing them, and the second focuses on running a command line tool on a disk image.
The main difference from the first tutorial, which focused on file ingest modules, is that these are data source ingest modules. Data source-ingest modules are given a reference to a data source and the module needs to find the files to analyze, whereas file-level ingest modules are given a reference to each file in the data source.
\section python_tutorial2_assumptions Assumptions
This post assumes you've read the \ref mod_python_file_ingest_tutorial_page "first tutorial". That means that you should know why it is better to write an Autopsy module versus a stand-alone tool, and what you need to set up (Autopsy installed, text editor, etc.). You may also recall the limitations (and benefits) of data source ingest modules. The most notable difference between them is that data source-ingest modules may not have access to carved files or files that are inside of ZIP files. For our example in this post, we are looking for a SQLite database with a specific name, and it will not be inside of a ZIP file, so data source ingest modules are the most efficient and will get us results faster.
The other assumption is that you know something about SQL queries. We have some example queries below and we don't go into detail about how they work.
\section python_tutorial2_getting_started Getting Started
\subsection python_tutorial2_folder Making Your Module Folder
We'll start by making our module folder. As we learned in the \ref mod_python_file_ingest_tutorial_page "first tutorial", every Python module in Autopsy gets its own folder. To find out where you should put your Python module, launch Autopsy and choose the Tools->Python Plugins menu item. That will open a subfolder in your AppData folder, such as "C:\Users\JDoe\AppData\Roaming\Autopsy\python_modules".
Make a folder inside of there to store your module. Call it "DemoScript2". Copy the <a href="https://github.com/sleuthkit/autopsy/blob/develop/pythonExamples/dataSourceIngestModule.py" target="_blank" rel="noopener noreferrer">dataSourcengestModule.py</a> sample file from github into the this new folder and rename it to FindContactsDb.py.
\subsection python_tutorial2_script Writing The Script
We are going to write a script that:
<ul>
<li>Queries the backend database for files of a given name</li>
<li>Opens the database</li>
<li>Queries data from the database and makes an artifact for each row</li>
</ul>
Open the FindContactsDb.py script in your favorite text editor. The sample Autopsy Python modules all have TODO entries in them to let you know what you should change. The below steps jump from one TODO to the next.
<ol>
<li><strong>Factory Class Name</strong>: The first thing to do is rename the sample class name from "SampleJythonDataSourceIngestModuleFactory" to "ContactsDbIngestModuleFactory". In the sample module, there are several uses of this class name, so you should search and replace for these strings.</li>
<li><strong>Name and Description</strong>: The next TODO entries are for names and descriptions. These are shown to users. For this example, we'll name it "Contacts Db Analyzer". The description can be anything you want. Note that Autopsy requires that modules have unique names, so don't make it too generic.</li>
<li><strong>Ingest Module Class Name</strong>: The next thing to do is rename the ingest module class from "SampleJythonDataSourceIngestModule" to "ContactsDbIngestModule". Our usual naming convention is that this class is the same as the factory class with "Factory" removed from the end. There are a couple of places where this name is used, so do a search and replace in your code.</li>
<li><strong>startUp() method</strong>: The startUp() method is where each module initializes. For our example, we don't need to do anything special in here except save a reference to the passed in context object. This is used later on to see if the module has been cancelled.</li>
<li><strong>process() method</strong>: This is where we do our analysis and we'll focus on this more in the next section.</li>
</ol>
That's it. In the file-level ingest module, we had a shutdown() method, but we do not need that with data source-level ingest modules. When their process method is finished, it can shut itself down. The process() method will be called only once.
\subsection python_tutorial2_process The process() Method
The process method in a data source-level ingest module is passed in reference to the data source as a <a href="https://www.sleuthkit.org/sleuthkit/docs/jni-docs/interfaceorg_1_1sleuthkit_1_1datamodel_1_1_content.html" target="_blank" rel="noopener noreferrer">Content</a> object and a <a href="https://sleuthkit.org/autopsy/docs/api-docs/3.1/classorg_1_1sleuthkit_1_1autopsy_1_1ingest_1_1_data_source_ingest_module_progress.html" target="_blank" rel="noopener noreferrer">Progress Bar</a> class to update our progress.</p>
<p>For this tutorial, you can start by deleting the contents of the existing process() method in the sample module. The full source code is linked to at the end of this blog and shows more detail about a fully fledged module. We'll just cover the analytics in the blog.</p>
\subsubsection python_tutorial2_getting_files Getting Files
Because data source-level ingest modules are not passed in specific files to analyze, nearly all of these types of modules will need to use the org.sleuthkit.autopsy.casemodule.services.FileManager service to find relevant files. Check out the methods on that class to see the different ways that you can find files.
NOTE: See the \ref python_tutorial2_running_exes section for an example of when you simply want to run a command line tool on a disk image instead of querying for files to analyze.
For our example, we want to find all files named "contacts.db". The org.sleuthkit.autopsy.casemodule.services.FileManager class contains several findFiles() methods to help. You can search for all files with a given name or files with a given name in a particular folder. You can also use SQL syntax to match file patterns, such as "%.jpg" to find all files with a JPEG extension.
Our example needs these two lines to get the FileManager for the current case and to find the files.
\verbatim
fileManager = Case.getCurrentCase().getServices().getFileManager()
files = fileManager.findFiles(dataSource, "contacts.db")\endverbatim
findFiles() returns a list of <a href="https://sleuthkit.org/sleuthkit/docs/jni-docs/classorg_1_1sleuthkit_1_1datamodel_1_1_abstract_file.html">AbstractFile</a> objects. This gives you access to the file's metadata and content.
For our example, we are going to open these SQLite files. That means that we need to save them to disk. This is less than ideal because it wastes time writing the data to disk and then reading it back in, but it is the only option with many libraries. If you are doing some other type analysis on the content, then you do not need to write it to disk. You can read directly from the AbstractFile (see the sample modules for specific code to do this).
The org.sleuthkit.autopsy.datamodel.ContentUtils class provides a utility to save file content to disk. We'll make a path in the temp folder of our case directory. To prevent naming collisions, we'll name the file based on its unique ID. The following two lines save the file to lclDbPath.
\verbatim
lclDbPath = os.path.join(Case.getCurrentCase().getTempDirectory(), str(file.getId()) + ".db")
ContentUtils.writeToFile(file, File(lclDbPath))\endverbatim
\subsubsection python_tutorial2_analyzing_sqlite Analyzing SQLite
Next, we need to open the SQLite database. We are going to use the Java JDBC infrastructure for this. JDBC is Java's generic way of dealing with different types of databases. To open the database, we do this:
\verbatim
Class.forName("org.sqlite.JDBC").newInstance()
dbConn = DriverManager.getConnection("jdbc:sqlite:%s" % lclDbPath)\endverbatim
With our connection in hand, we can do some queries. In our sample database, we have a single table named "contacts", which has columns for name, email, and phone. We first start by querying for all rows in our simple table:
\verbatim
stmt = dbConn.createStatement()
resultSet = stmt.executeQuery("SELECT * FROM contacts")\endverbatim
For each row, we are going to get the values for the name, e-mail, and phone number and make a TSK_CONTACT artifact. Recall from the first tutorial that posting artifacts to the blackboard allows modules to communicate with each other and also allows you to easily display data to the user. The TSK_CONTACT artifact is for storing contact information.
The basic approach in our example is to make an artifact of a given type (TSK_CONTACT) and have it be associated with the database it came from. We then make attributes for the name, email, and phone. The following code does this for each row in the database:
\verbatim
while resultSet.next():
# Make an artifact on the blackboard and give it attributes
art = file.newArtifact(BlackboardArtifact.ARTIFACT_TYPE.TSK_CONTACT)
name = resultSet.getString("name")
art.addAttribute(BlackboardAttribute(
BlackboardAttribute.ATTRIBUTE_TYPE.TSK_NAME_PERSON.getTypeID(),
ContactsDbIngestModuleFactory.moduleName, name))
email = resultSet.getString("email")
art.addAttribute(BlackboardAttribute(
BlackboardAttribute.ATTRIBUTE_TYPE.TSK_EMAIL.getTypeID(),
ContactsDbIngestModuleFactory.moduleName, email))
phone = resultSet.getString("phone")
art.addAttribute(BlackboardAttribute(
BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PHONE_NUMBER.getTypeID(),
ContactsDbIngestModuleFactory.moduleName, phone))\endverbatim
That's it. We've just found the databases, queried them, and made artifacts for the user to see. There are some final things though. First, we should fire off an event so that the UI updates and refreshes with the new artifacts. We can fire just one event after each database is parsed (or you could fire one for each artifact - it's up to you).
\verbatim
IngestServices.getInstance().fireModuleDataEvent(
ModuleDataEvent(ContactsDbIngestModuleFactory.moduleName,
BlackboardArtifact.ARTIFACT_TYPE.TSK_CONTACT, None))\endverbatim
And the final thing is to clean up. We should close the database connections and delete our temporary file.
\verbatim
stmt.close()
dbConn.close()
os.remove(lclDbPath)\endverbatim
\subsection python_tutorial2_niceties Niceties
Data source-level ingest modules can run for quite some time. Therefore, data source-level ingest modules should do some additional things that file-level ingest modules do not need to.
<ul>
<li>Progress bars: Each data source-level ingest module will have its own progress bar in the lower right. A reference to it is passed into the process() method. You should update it to provide user feedback.</li>
<li>Cancellation: A user could cancel ingest while your module is running. You should periodically check if that occurred so that you can bail out as soon as possible. You can do that with a check of:
\verbatim if self.context.isJobCancelled():\endverbatim </li>
</ul>
\subsection python_tutorial2_tips Debugging and Development Tips
You can find the full file along with a small sample database on <a href="https://github.com/sleuthkit/autopsy/tree/develop/pythonExamples/Aug2015DataSourceTutorial">github</a>. To use the database, add it as a logical file and run your module on it.
Whenever you have syntax errors or other errors in your script, you will get some form of dialog from Autopsy when you try to run ingest modules. If that happens, fix the problem and run ingest modules again. You don't need to restart Autopsy each time!
The sample module has some log statements in there to help debug what is going on since we don't know of better ways to debug the scripts while running in Autopsy.
\section python_tutorial2_running_exes Running Executables
While the above example outlined using the FileManager to find files to analyze, the other common use of data source-level ingest modules is to wrap a command line tool that takes a disk image as input. A sample program (RunExe.py) that does that can be found on <a href="https://github.com/sleuthkit/autopsy/tree/develop/pythonExamples/Aug2015DataSourceTutorial">github</a>. I'll cover the big topics of that program in this section. There are more details in the script about error checking and such.
\subsection python_tutorial2_finding_exe Finding The Executable
To write this kind of data source-level ingest module, put the executable in your module's folder (the DemoScript2 folder we previously made). Use "__file__" to get the path to where your script is and then use some os.path methods to get to the executable in the same folder.
\verbatim
path_to_exe = os.path.join(os.path.dirname(os.path.abspath(__file__)), "img_stat.exe")\endverbatim
In our sample program, we do this and verify we can find it in the startup() method so that if we don't, then ingest never starts.
\subsection python_tutorial2_running_the_exe Running The Executable
Data sources can be disk images, but they can also be a folder of files. We only want to run our executable on a disk image. So, verify that:
\verbatim
if not isinstance(dataSource, Image):
self.log(Level.INFO, "Ignoring data source. Not an image")
return IngestModule.ProcessResult.OK \endverbatim
You can get the path to the disk image using dataSource.getPaths().
Once you have the EXE and the disk image, you can use the various <a href="https://pymotw.com/2/subprocess/">subprocess</a> methods to run them.
\subsection python_tutorial2_showing_results Showing the User Results
After the command line tool runs, you have the option of either showing the user the raw output of the tool or parsing it into individual artifacts. Refer to previous sections of this tutorial and the previous tutorial for making artifacts. If you want to simply show the user the output of the tool, then save the output to the Reports folder in the Case directory:
\verbatim
reportPath = os.path.join(Case.getCurrentCase().getCaseDirectory(),
"Reports", "img_stat-" + str(dataSource.getId()) + ".txt") \endverbatim
Then you can add the report to the case so that it shows up in the tree in the main UI panel.
\verbatim Case.getCurrentCase().addReport(reportPath, "Run EXE", "img_stat output")\endverbatim
\section python_tutorial2_conclusion Conclusion
Data source-level ingest modules allow you to query for a subset of files by name or to run on an entire disk image. This tutorial has shown an example of both use cases and shown how to use SQLite in Jython.
*/

View File

@ -3,32 +3,32 @@
\section python_tutorial1_why Why Write a File Ingest Module?
<ul>
<li>Autopsy hides the fact that a file is coming from a file system, was carved, was from inside of a ZIP file, or was part of a local file. So, you dont need to spend time supporting all of the ways that your user may want to get data to you. You just need to worry about analyzing the content.</li>
<li>Autopsy displays files automatically and can include them in reports if you use standard blackboard artifacts (described later). That means you dont need to worry about UIs and reports.</li>
<li>Autopsy hides the fact that a file is coming from a file system, was carved, was from inside of a ZIP file, or was part of a local file. So, you don't need to spend time supporting all of the ways that your user may want to get data to you. You just need to worry about analyzing the content.</li>
<li>Autopsy displays files automatically and can include them in reports if you use standard blackboard artifacts (described later). That means you don't need to worry about UIs and reports.</li>
<li>Autopsy gives you access to results from other modules. So, you can build on top of their results instead of duplicating them.</li>
</ul>
\section python_tutorial1_ingest_modules Ingest Modules
For our first example, were going to write an ingest module. Ingest modules in Autopsy run on the data sources that are added to a case. When you add a disk image (or local drive or logical folder) in Autopsy, youll be presented with a list of modules to run (such as hash lookup and keyword search).
For our first example, we're going to write an ingest module. Ingest modules in Autopsy run on the data sources that are added to a case. When you add a disk image (or local drive or logical folder) in Autopsy, you'll be presented with a list of modules to run (such as hash lookup and keyword search).
\image html ingest-modules.PNG
Those are all ingest modules. Were going to write one of those. There are two types of ingest modules that we can build:
Those are all ingest modules. We're going to write one of those. There are two types of ingest modules that we can build:
<ul>
<li>File Ingest Modules are the easiest to write. During their lifetime, they will get passed in each file in the data source. This includes files that are found via carving or inside of ZIP files (if those modules are also enabled).</li>
<li>Data Source Ingest Modules require slightly more work because you have to query the database for the files of interest. If you only care about a small number of files, know their name, and know they wont be inside of ZIP files, then these are your best bet.</li>
<li>Data Source Ingest Modules require slightly more work because you have to query the database for the files of interest. If you only care about a small number of files, know their name, and know they won't be inside of ZIP files, then these are your best bet.</li>
</ul>
For this first tutorial, were going to write a file ingest module. The \ref mod_python_ds_ingest_tutorial_page "second tutorial" will focus on data source ingest modules. Regardless of the type of ingest module you are writing, you will need to work with two classes:
For this first tutorial, we're going to write a file ingest module. The \ref mod_python_ds_ingest_tutorial_page "second tutorial" will focus on data source ingest modules. Regardless of the type of ingest module you are writing, you will need to work with two classes:
<ul>
<li>The factory class provides Autopsy with module information such as display name and version. It also creates instances of ingest modules as needed.</li>
<li>The ingest module class will do the actual analysis. One of these will be created per thread. For file ingest modules, Autopsy will typically create two or more of these at a time so that it can analyze files in parallel. If you keep things simple, and dont use static variables, then you dont have to think about anything multithreaded.</li>
<li>The ingest module class will do the actual analysis. One of these will be created per thread. For file ingest modules, Autopsy will typically create two or more of these at a time so that it can analyze files in parallel. If you keep things simple, and don't use static variables, then you don't have to think about anything multithreaded.</li>
</ul>
\section python_tutorial1_getting_started Getting Started
To write your first file ingest module, youll need:
To write your first file ingest module, you'll need:
<ul>
<li>An installed copy of Autopsy available from <a href="https://www.sleuthkit.org/autopsy/download.php" target="_blank" rel="noopener noreferrer">SleuthKit</a></li>
<li>A text editor.</li>
@ -37,8 +37,8 @@ To write your first file ingest module, youll need:
Some other general notes are that you will be writing in Jython, which converts Python-looking code into Java. It has some limitations, including:
<ul>
<li>You cant use Python 3 (you are limited to Python 2.7)</li>
<li>You cant use libraries that use native code</li>
<li>You can't use Python 3 (you are limited to Python 2.7)</li>
<li>You can't use libraries that use native code</li>
</ul>
But, Jython will give you access to all of the Java classes and services that Autopsy provides. So, if you want to stray from this example, then refer to the Developer docs on what classes and methods you have access to. The comments in the sample file will identify what type of object is being passed in along with a URL to its documentation.
@ -53,21 +53,21 @@ Every Python module in Autopsy gets its own folder. This reduces naming collisio
\subsection python_tutorial1_writing Writing the Script
We are going to write a script that flags any file that is larger than 10MB and whose size is a multiple of 4096. Well call these big and round files. This kind of technique could be useful for finding encrypted files. An additional check would be for entropy of the file, but well keep the example simple.
We are going to write a script that flags any file that is larger than 10MB and whose size is a multiple of 4096. We'll call these big and round files. This kind of technique could be useful for finding encrypted files. An additional check would be for entropy of the file, but we'll keep the example simple.
Open the FindBigRoundFiles.py file in your favorite python text editor. The sample Autopsy Python modules all have TODO entries in them to let you know what you should change. The below steps jump from one TODO to the next.
<ol>
<li><b>Factory Class Name</b>: The first thing to do is rename the sample class name from “SampleJythonFileIngestModuleFactory” to “FindBigRoundFilesIngestModuleFactory”. In the sample module, there are several uses of this class name, so you should search and replace for these strings.</li>
<li><b>Name and Description</b>: The next TODO entries are for names and descriptions. These are shown to users. For this example, well name it “Big and Round File Finder”. The description can be anything you want. Note that Autopsy requires that modules have unique names, so dont make it too generic.</li>
<li><b>Ingest Module Class Name</b>: The next thing to do is rename the ingest module class from “SampleJythonFileIngestModule” to “FindBigRoundFilesIngestModule”. Our usual naming convention is that this class is the same as the factory class with “Factory” removed from the end.</li>
<li><b>startUp() method</b>: The startUp() method is where each module initializes. For our example, we dont need to do anything special in here. Typically though, this is where you want to do stuff that could fail because throwing an exception here causes the entire ingest to stop.</li>
<li><b>process() method</b>: This is where we do our analysis. The sample module is well documented with what it does. It ignores non-files, looks at the file name, and makes a blackboard artifact for “.txt” files. There are also a bunch of other things that it does to show examples for easy copy and pasting, but we dont need them in our module. Well cover what goes into this method in the next section.</li>
<li><b>Factory Class Name</b>: The first thing to do is rename the sample class name from "SampleJythonFileIngestModuleFactory" to "FindBigRoundFilesIngestModuleFactory". In the sample module, there are several uses of this class name, so you should search and replace for these strings.</li>
<li><b>Name and Description</b>: The next TODO entries are for names and descriptions. These are shown to users. For this example, we'll name it "Big and Round File Finder". The description can be anything you want. Note that Autopsy requires that modules have unique names, so don't make it too generic.</li>
<li><b>Ingest Module Class Name</b>: The next thing to do is rename the ingest module class from "SampleJythonFileIngestModule" to "FindBigRoundFilesIngestModule". Our usual naming convention is that this class is the same as the factory class with "Factory" removed from the end.</li>
<li><b>startUp() method</b>: The startUp() method is where each module initializes. For our example, we don't need to do anything special in here. Typically though, this is where you want to do stuff that could fail because throwing an exception here causes the entire ingest to stop.</li>
<li><b>process() method</b>: This is where we do our analysis. The sample module is well documented with what it does. It ignores non-files, looks at the file name, and makes a blackboard artifact for ".txt" files. There are also a bunch of other things that it does to show examples for easy copy and pasting, but we don't need them in our module. We'll cover what goes into this method in the next section.</li>
<li><b>shutdown() method</b>: The shutDown() method either frees resources that were allocated or sends summary messages. For our module, it will do nothing.</li>
</ol>
\subsection python_tutorial1_process The process() Method
The process() method is passed in a reference to an AbstractFile Object. With this, you have access to all of a files contents and metadata. We want to flag files that are larger than 10MB and that are a multiple of 4096 bytes. The following code does that:
The process() method is passed in a reference to an AbstractFile Object. With this, you have access to all of a file's contents and metadata. We want to flag files that are larger than 10MB and that are a multiple of 4096 bytes. The following code does that:
\verbatim if ((file.getSize() > 10485760) and ((file.getSize() % 4096) == 0)):
\endverbatim
@ -92,7 +92,7 @@ The above code adds the artifact and a single attribute to the blackboard in the
ModuleDataEvent(FindBigRoundFilesIngestModuleFactory.moduleName,
BlackboardArtifact.ARTIFACT_TYPE.TSK_INTERESTING_FILE_HIT, None))\endverbatim
Thats it. Your process() method should look something like this:
That's it. Your process() method should look something like this:
\verbatim
def process(self, file):
@ -140,15 +140,15 @@ Thats it. Your process() method should look something like this:
return IngestModule.ProcessResult.OK\endverbatim
Save this file and run the module on some of your data. If you have any big and round files, you should see an entry under the “Interesting Items” node in the tree.
Save this file and run the module on some of your data. If you have any big and round files, you should see an entry under the "Interesting Items" node in the tree.
\image html bigAndRoundFiles.png
\subsection python_tutorial1_debug Debugging and Development Tips
Whenever you have syntax errors or other errors in your script, you will get some form of dialog from Autopsy when you try to run ingest modules. If that happens, fix the problem and run ingest modules again. You dont need to restart Autopsy each time!
Whenever you have syntax errors or other errors in your script, you will get some form of dialog from Autopsy when you try to run ingest modules. If that happens, fix the problem and run ingest modules again. You don't need to restart Autopsy each time!
The sample module has some log statements in there to help debug what is going on since we dont know of better ways to debug the scripts while running in Autopsy.
The sample module has some log statements in there to help debug what is going on since we don't know of better ways to debug the scripts while running in Autopsy.
*/

View File

@ -0,0 +1,123 @@
/*! \page mod_python_report_tutorial_page Python Tutorial #3: Writing a Report Module
In our last two tutorials, we built a Python Autopsy \ref mod_python_file_ingest_tutorial_page "file ingest modules" and \ref mod_python_ds_ingest_tutorial_page "data source ingest modules" that analyzed the data sources as they were added to cases. In our third post, we're going to make an entirely different kind of module, a report module.
Report modules are typically run after the user has completed their analysis. Autopsy comes with report modules to generate HTML, Excel, KML, and other types of reports. We're going to make a report module that outputs data in CSV.
Like in the second tutorial, we are going to assume that you've read at least the \ref mod_python_file_ingest_tutorial_page "first tutorial" to know how to get your environment set up. As a reminder, Python modules in Autopsy are written in Jython and have access to all of the Java classes (which is why we have links to Java documentation below).
\section python_tutorial3_report_modules Report Modules
Autopsy report modules are often run after the user has run some ingest modules, reviewed the results, and tagged some files of interest. The user will be given a list of report modules to choose from.
\image html reports_select.png
The main reasons for writing an Autopsy report module are:
<ul>
<li>You need the results in a custom output format, such as XML or JSON.</li>
<li>You want to upload results to a central location.</li>
<li>You want to perform additional analysis after all ingest modules have run. While the modules have the word "report" in them, there is no actual requirement that they produce a report or export data. The module can simply perform data analysis and post artifacts to the blackboard like ingest modules do.</li>
</ul>
As we dive into the details, you will notice that the report module API is fairly generic. This is because reports are created at a case level, not a data source level. So, when a user chooses to run a report module, all Autopsy does is tell it to run and gives it a path to a directory to store its results in. The report module can store whatever it wants in the directory.
Note that if you look at the \ref mod_report_page "full developer docs", there are other report module types that are supported in Java. These are not supported though in Python.
\subsection python_tutorial3_getting_content Getting Content
With report modules, it is up to you to find the content that you want to include in your report or analysis. Generally, you will want to access some or all of the files, tagged files, or blackboard artifacts. As you may recall from the previous tutorials, blackboard artifacts are how ingest modules in Autopsy store their results so that they can be shown in the UI, used by other modules, and included in the final report. In this tutorial, we will introduce the <a href="https://sleuthkit.org/sleuthkit/docs/jni-docs/classorg_1_1sleuthkit_1_1datamodel_1_1_sleuthkit_case.html">SleuthkitCase</a> class, which we generally don't introduce to module writers because it has lots of methods, many of which are low-level, and there are other classes, such as FileManager, that are more focused and easier to use.
\subsubsection python_tutorial3_getting_files Getting Files
You have three choices for getting files to report on. You can use the FileManager, which we used in \ref mod_python_ds_ingest_tutorial_page "the last Data Source-level Ingest Module tutorial". The only change is that you will need to call it multiple times, one for each data source in the case. You will have code that looks something like this:
\verbatim
dataSources = Case.getCurrentCase().getDataSources()
fileManager = Case.getCurrentCase().getServices().getFileManager()
for dataSource in dataSources:
files = fileManager.findFiles(dataSource, "%.txt")\endverbatim
Another approach is to use the <a href="https://sleuthkit.org/sleuthkit/docs/jni-docs/classorg_1_1sleuthkit_1_1datamodel_1_1_sleuthkit_case.html#a6b14c6b82bbc1cf71aa108f9e5c5ccc1">SleuthkitCase.findAllFilesWhere()</a> method that allows you to specify a SQL query. To use this method, you must know the schema of the database (which makes this a bit more challenging, but more powerful). The schema is defined on the <a href="https://wiki.sleuthkit.org/index.php?title=SQLite_Database_v3_Schema">wiki</a>.
Usually, you just need to focus on the <a href="https://wiki.sleuthkit.org/index.php?title=SQLite_Database_v3_Schema#tsk_files">tsk_files</a> table. You may run into memory problems and you can also use <a href="https://sleuthkit.org/sleuthkit/docs/jni-docs/classorg_1_1sleuthkit_1_1datamodel_1_1_sleuthkit_case.html#a2faec4e68be17f67db298a4ed3933bc3">SleuthkitCase.findAllFileIdsWhere()</a> to get just the IDs and then call <a href="https://sleuthkit.org/sleuthkit/docs/jni-docs/classorg_1_1sleuthkit_1_1datamodel_1_1_sleuthkit_case.html#a8cdd6582b18e9bfa814cffed8302e4b9">SleuthkitCase.getAbstractFileById()</a> to get files as needed.
A third approach is to call org.sleuthkit.autopsy.casemodule.Case.getDataSources(), and then recursively call getChildren() on each Content object. This will traverse all of the folders and files in the case. This is the most memory efficient, but also more complex to code.
\subsubsection python_tutorial3_getting_artifacts Getting Blackboard Artifacts
The blackboard is where modules store their analysis results. If you want to include them in your report, then there are several methods that you could use. If you want all artifacts of a given type, then you can use <a href="https://sleuthkit.org/sleuthkit/docs/jni-docs/classorg_1_1sleuthkit_1_1datamodel_1_1_sleuthkit_case.html#a0b8396fac6c40d8291cc48732dd15d74">SleuthkitCase.getBlackboardArtifacts()</a>. There are many variations of this method that take different arguments. Look at them to find the one that is most convenient for you.
\subsubsection python_tutorial3_getting_tags Getting Tagged Files or Artifacts
If you want to find files or artifacts that are tagged, then you can use the org.sleuthkit.autopsy.casemodule.services.TagsManager. It has methods to get all tags of a given name, such as org.sleuthkit.autopsy.casemodule.services.TagsManager.getContentTagsByTagName().
\section python_tutorial3_getting_started Getting Started
\subsection python_tutorial3_making_the_folder Making the Folder
We'll start by making our module folder. As we learned in \ref mod_python_file_ingest_tutorial_page "the first tutorial", every Python module in Autopsy gets its own folder. To find out where you should put your Python module, launch Autopsy and choose the Tools->Python Plugins menu item. That will open a subfolder in your AppData folder, such as "C:\Users\JDoe\AppData\Roaming\Autopsy\python_modules".
Make a folder inside of there to store your module. Call it "DemoScript3". Copy the <a href="https://github.com/sleuthkit/autopsy/blob/develop/pythonExamples/reportmodule.py">reportmodule.py</a> sample file into the this new folder and rename it to CSVReport.py.
\subsection python_tutorial3_writing_script Writing the Script
We are going to write a script that makes some basic CSV output: file name and MD5 hash. Open the CSVReport.py file in your favorite Python text editor. The sample Autopsy Python modules all have TODO entries in them to let you know what you should change. The below steps jump from one TODO to the next.
<ol>
<li>Factory Class Name: The first thing to do is rename the sample class name from "SampleGeneralReportModule" to "CSVReportModule". In the sample module, there are several uses of this class name, so you should search and replace for these strings.</li>
<li>Name and Description: The next TODO entries are for names and descriptions. These are shown to users. For this example, we'll name it "CSV Hash Report Module". The description can be anything you want. Note that Autopsy requires that modules have unique names, so don't make it too generic.</li>
<li>Relative File Path: The next step is to specify the filename that your module is going to use for the report. Autopsy will later provide you with a folder name to save your report in. If you have multiple file names, then pick the main one. This path will be shown to the user after the report has been generated so that they can open it. For this example, we'll call it "hashes.csv" in the getRelativeFilePath() method.</li>
<li>generateReport() Method: This method is what is called when the user wants to run the module. It gets passed in the base directory to store the results in and a progress bar. It is responsible for making the report and calling Case.addReport() so that it will be shown in the tree. We'll cover the details of this method in a later section.</li>
</ol>
\subsection python_tutorial3_generate_report The generateReport() method
The generateReport() method is where the work is done. The baseReportDir argument is a string for the base directory to store results in. The progressBar argument is a org.sleuthkit.autopsy.report.ReportProgressPanel
that shows the user progress while making long reports and to make the progress bar red if an error occurs.
We'll use one of the basic ideas from the sample, so you can copy and paste from that as you see fit to make this method. Our general approach is going to be this:
<ol>
<li>Open the CSV file.</li>
<li>Query for all files.</li>
<li>Cycle through each of the files and print a line of text.</li>
<li>Add the report to the Case database.</li>
</ol>
To focus on the essential code, we'll skip the progress bar details. However, the final solution that we'll link to at the end contains the progress bar code.
To open the report file in the right folder, we'll need a line such as this:
\verbatim
fileName = os.path.join(baseReportDir, self.getRelativeFilePath())
report = open(fileName, 'w')\endverbatim
Next we need to query for the files. In our case, we want all of the files, but can skip the directories. We'll use lines such as this to get the current case and then call the SleuthkitCase.findAllFilesWhere() method.
\verbatim
sleuthkitCase = Case.getCurrentCase().getSleuthkitCase()
files = sleuthkitCase.findAllFilesWhere("NOT meta_type = " +
str(TskData.TSK_FS_META_TYPE_ENUM.TSK_FS_META_TYPE_DIR.getValue()))\endverbatim
Now, we want to print a line for each file. To do this, you'll need something like:
\verbatim
for file in files:
md5 = file.getMd5Hash()
if md5 is None:
md5 = ""
report.write(file.getParentPath() + file.getName() + "," + md5 + "n")\endverbatim
Note that the file will only have an MD5 value if the Hash Lookup ingest module was run on the data source.
Lastly, we want to add the report to the case database so that the user can later find it from the tree and we want to report that we completed successfully.
\verbatim
Case.getCurrentCase().addReport(fileName, self.moduleName, "Hashes CSV")
progressBar.complete(ReportStatus.COMPLETE)\endverbatim
That's it. The final code can be found <a href="https://github.com/sleuthkit/autopsy/tree/develop/pythonExamples/Sept2015ReportTutorial_CSV">on github</a>.
\subsection python_tutorial3_conclusions Conclusions
In this tutorial, we made a basic report module that creates a custom CSV file. The most challenging part of writing a report module is knowing how to get all of the data that you need. Hopefully, the \ref python_tutorial3_getting_content section above covered what you need, but if not, then go on the <a href="https://sleuthkit.discourse.group/">Sleuthkit forum</a> and we'll try to point you in the right direction.</p>
*/