/*! \page mod_python_ds_ingest_tutorial_page Python Tutorial #2: Writing a Data Source Ingest Module In the \ref mod_python_file_ingest_tutorial_page "first tutorial" we built a basic Python Autopsy module that looked for big and round files. In this tutorial we're going to make two data source ingest modules. The first focuses on finding SQLite databases and parsing them, and the second focuses on running a command line tool on a disk image. The main difference from the first tutorial, which focused on file ingest modules, is that these are data source ingest modules. Data source-ingest modules are given a reference to a data source and the module needs to find the files to analyze, whereas file-level ingest modules are given a reference to each file in the data source. \section python_tutorial2_assumptions Assumptions This post assumes you've read the \ref mod_python_file_ingest_tutorial_page "first tutorial". That means that you should know why it is better to write an Autopsy module versus a stand-alone tool, and what you need to set up (Autopsy installed, text editor, etc.). You may also recall the limitations (and benefits) of data source ingest modules. The most notable difference between them is that data source-ingest modules may not have access to carved files or files that are inside of ZIP files. For our example in this post, we are looking for a SQLite database with a specific name, and it will not be inside of a ZIP file, so data source ingest modules are the most efficient and will get us results faster. The other assumption is that you know something about SQL queries. We have some example queries below and we don't go into detail about how they work. \section python_tutorial2_getting_started Getting Started \subsection python_tutorial2_folder Making Your Module Folder We'll start by making our module folder. As we learned in the \ref mod_python_file_ingest_tutorial_page "first tutorial", every Python module in Autopsy gets its own folder. To find out where you should put your Python module, launch Autopsy and choose the Tools->Python Plugins menu item. That will open a subfolder in your AppData folder, such as "C:\Users\JDoe\AppData\Roaming\Autopsy\python_modules". Make a folder inside of there to store your module. Call it "DemoScript2". Copy the dataSourcengestModule.py sample file from github into the this new folder and rename it to FindContactsDb.py. \subsection python_tutorial2_script Writing The Script We are going to write a script that:
For this tutorial, you can start by deleting the contents of the existing process() method in the sample module. The full source code is linked to at the end of this blog and shows more detail about a fully fledged module. We'll just cover the analytics in the blog.
\subsubsection python_tutorial2_getting_files Getting Files Because data source-level ingest modules are not passed in specific files to analyze, nearly all of these types of modules will need to use the org.sleuthkit.autopsy.casemodule.services.FileManager service to find relevant files. Check out the methods on that class to see the different ways that you can find files. NOTE: See the \ref python_tutorial2_running_exes section for an example of when you simply want to run a command line tool on a disk image instead of querying for files to analyze. For our example, we want to find all files named "contacts.db". The org.sleuthkit.autopsy.casemodule.services.FileManager class contains several findFiles() methods to help. You can search for all files with a given name or files with a given name in a particular folder. You can also use SQL syntax to match file patterns, such as "%.jpg" to find all files with a JPEG extension. Our example needs these two lines to get the FileManager for the current case and to find the files. \verbatim fileManager = Case.getCurrentCase().getServices().getFileManager() files = fileManager.findFiles(dataSource, "contacts.db")\endverbatim findFiles() returns a list of AbstractFile objects. This gives you access to the file's metadata and content. For our example, we are going to open these SQLite files. That means that we need to save them to disk. This is less than ideal because it wastes time writing the data to disk and then reading it back in, but it is the only option with many libraries. If you are doing some other type analysis on the content, then you do not need to write it to disk. You can read directly from the AbstractFile (see the sample modules for specific code to do this). The org.sleuthkit.autopsy.datamodel.ContentUtils class provides a utility to save file content to disk. We'll make a path in the temp folder of our case directory. To prevent naming collisions, we'll name the file based on its unique ID. The following two lines save the file to lclDbPath. \verbatim lclDbPath = os.path.join(Case.getCurrentCase().getTempDirectory(), str(file.getId()) + ".db") ContentUtils.writeToFile(file, File(lclDbPath))\endverbatim \subsubsection python_tutorial2_analyzing_sqlite Analyzing SQLite Next, we need to open the SQLite database. We are going to use the Java JDBC infrastructure for this. JDBC is Java's generic way of dealing with different types of databases. To open the database, we do this: \verbatim Class.forName("org.sqlite.JDBC").newInstance() dbConn = DriverManager.getConnection("jdbc:sqlite:%s" % lclDbPath)\endverbatim With our connection in hand, we can do some queries. In our sample database, we have a single table named "contacts", which has columns for name, email, and phone. We first start by querying for all rows in our simple table: \verbatim stmt = dbConn.createStatement() resultSet = stmt.executeQuery("SELECT * FROM contacts")\endverbatim For each row, we are going to get the values for the name, e-mail, and phone number and make a TSK_CONTACT artifact. Recall from the first tutorial that posting artifacts to the blackboard allows modules to communicate with each other and also allows you to easily display data to the user. The TSK_CONTACT artifact is for storing contact information. The artifact catalog shows that TSK_CONTACT is a data artifact, so we will be using the newDataArtifact() method to create each one. The basic approach in our example is to make an artifact of a given type (TSK_CONTACT) and have it be associated with the database it came from. We then make attributes for the name, email, and phone. The following code does this for each row in the database: \verbatim while resultSet.next(): try: name = resultSet.getString("name") email = resultSet.getString("email") phone = resultSet.getString("phone") except SQLException as e: self.log(Level.INFO, "Error getting values from contacts table (" + e.getMessage() + ")") # Make an artifact on the blackboard, TSK_CONTACT and give it attributes for each of the fields art = file.newDataArtifact(BlackboardArtifact.Type.TSK_CONTACT, Arrays.asList( BlackboardAttribute(BlackboardAttribute.Type.TSK_NAME_PERSON, ContactsDbIngestModuleFactory.moduleName, name), BlackboardAttribute(BlackboardAttribute.Type.TSK_EMAIL, ContactsDbIngestModuleFactory.moduleName, email), BlackboardAttribute(BlackboardAttribute.Type.TSK_PHONE_NUMBER, ContactsDbIngestModuleFactory.moduleName, phone) ))\endverbatim That's it. We've just found the databases, queried them, and made artifacts for the user to see. There are some final things though. First, we should fire off an event so that the UI updates and refreshes with the new artifacts. We can fire just one event after each database is parsed (or you could fire one for each artifact - it's up to you). \verbatim IngestServices.getInstance().fireModuleDataEvent( ModuleDataEvent(ContactsDbIngestModuleFactory.moduleName, BlackboardArtifact.ARTIFACT_TYPE.TSK_CONTACT, None))\endverbatim And the final thing is to clean up. We should close the database connections and delete our temporary file. \verbatim stmt.close() dbConn.close() os.remove(lclDbPath)\endverbatim The final version of findContactsDb.py can be found on github. \subsection python_tutorial2_niceties Niceties Data source-level ingest modules can run for quite some time. Therefore, data source-level ingest modules should do some additional things that file-level ingest modules do not need to.