/*! \page file_discovery_page File Discovery
\section file_disc_prereq Prerequisites
We suggest running all \ref ingest_page "ingest modules" before launching file discovery, but if time is a factor the following are the modules that are the most important. You will see a warning if you open file discovery without running the \ref file_type_identification_page and \ref EXIF_parser_page.
Required ingest modules:
- \ref file_type_identification_page
Optional ingest modules:
- \ref cr_ingest_module - Needed to use the \ref file_disc_occur_filter
- \ref EXIF_parser_page - Needed to use the \ref file_disc_user_filter
- \ref hash_db_page - Needed to use the \ref file_disc_hash_filter and to de-duplicate files
- \ref interesting_files_identifier_page - Needed to use the \ref file_disc_int_filter
- \ref object_detection_page - Needed to use the \ref file_disc_obj_filter
\section file_disc_run Running File Discovery
To launch file discovery, either click the "File Discovery" icon near the top of the Autopsy UI or go to "Tools", "File Discovery". There are three steps when setting up file discovery, which flow from the top of the panel to the bottom:
- \ref file_disc_type "Choose the file type"
- \ref file_disc_filtering "Set up filters"
- \ref file_disc_grouping "Choose how to group and sort the results
Once everything is set up, use the "Show" button at the bottom of the left panel to display your results. If you want to cancel a search in progress you can use the "Cancel" button.
\image html FileDiscovery/fd_main.png
\subsection file_disc_type File Type
The first step is choosing whether you want to display images or videos. The file type is determined by the MIME type of the file, which is why the file_type_identification_page must be run to see any results. Switching between the file types will clear any results being displayed and reset the filters.
\image html FileDiscovery/fd_fileType.png
\subsection file_disc_filtering Filtering
The second step is to select and configure your filters. For most filters, you enable them using the checkbox on the left and then select your options. Multiple options can be selected by using CTRL + left click. Files must pass all enabled filters to be displayed.
\subsubsection file_disc_size_filter File Size Filter
The file size filter lets you restrict the size of your results. The options are different for images and videos - an extra small image might be under 16 KB while an extra small video is anything under 500 KB.
\image html FileDiscovery/fd_fileSizeFilter.png
\subsubsection file_disc_ds_filter Data Source Filter
The data source filter lets you restrict which data sources in your case to include in the results.
\image html FileDiscovery/fd_dataSourceFilter.png
\subsubsection file_disc_occur_filter Past Occurrences Filter
The past occurrences filter uses the \ref central_repo_page "central repository" and \ref hash_db_page "known hash sets" to restrict how commom/rare a file must be to be included in the results. By default, the "Known Files" option is disabled, meaning that any file matching the NSRL or other white-listed hash set will not be displayed.
\image html FileDiscovery/fd_pastOccur.png
The counts for the rest of the options are based on how many data sources in your central repository contain a copy of this file (based on hash). If a file only appears in one data source in the current case, then it will match "Unique(1)". If it has only been seen in a few other data source, it will match "Rare(2-10)". Note that it doesn't matter how many times a file appears in each data source - a file could have twenty copies in one data source and still be "unique".
\subsubsection file_disc_user_filter Possibly User Created
The possibly user created filter restricts the results to files that suspected to be raw images or videos.
\image html FileDiscovery/fd_userCreatedFilter.png
This means the image or video must have a "User Content Suspected" result associated with it. These primarily come from the \ref EXIF_parser_page "Exif parser module".
\image html FileDiscovery/fd_userContentArtifact.png
\subsubsection file_disc_hash_filter Hash Set Filter
The hash set filter restricts the results to files found in the selected hash sets. Only notable hash sets that have hits in the current case are listed (though those hits may not be images or videos). See the \ref hash_db_page page for more information on creating and using hash sets.
\image html FileDiscovery/fd_hashSetFilter.png
\subsubsection file_disc_int_filter Interesting Item Filter
The interesting item filter restricts the results to files found in the selected interesting item rule sets. Only interesting file rule sets that have results in the current case are listed (though those matches may not be images or videos). See the \ref interesting_item_page page for more information on creating and using interesting item rule sets.
\image html FileDiscovery/fd_interestingItemsFilter.png
\subsubsection file_disc_obj_filter Object Detected Filter
The object detected filter restricts the results to files that matched the selected classifiers. Only classifiers that have results in the current case are listed. Note that currently the built-in \ref object_detection_page ingest module only works on images, so you should generally not use this filter with videos. See the \ref object_detection_page page for more information on setting up classifiers.
\image html FileDiscovery/fd_objectFilter.png
\subsubsection file_disc_parent_filter Parent Folder Filter
The parent folder filter either restricts the path the files can be on. This filter works differently than the others in that the individual options do not have to be selected - every rule that has been entered will be applied.
\image html FileDiscovery/fd_parentFilter.png
You can enter paths that should be included and paths that should be ignored. For both you then specify whether the path string you entered is a full path or a substring. For full path matches you'll need to include the leading and trailing slashes. Full path matches are also case-sensitive.
The default options, shown above, will exclude any file that has a "Windows" folder or a "Program Files" folder in its path. It would exclude files like "/Windows/System32/image1.jpg" but would not exclude "/My Pictures/Bay Windows/image2.jpg" because the slashes around "Windows" force it to match the exact folder name.
Here is another example. This rule was created with "Full" and "Include" selected.
\image html FileDiscovery/fd_parentEx2.png
This matches the file "/LogicalFileSet2/File Discovery/bird1.tif"
When considering multiple "Include" rules, remember that all rules are applied to each file path. So making a rule to include "My Documents" and another to include "My Pictures" will mean that only files that contain both folders in their path (e.g., "/My Documents/files/My Pictures/image3.png") will appear in the results.
\subsection file_disc_grouping Grouping and Sorting
The final options are for how you want to group and sort your results.
\image html FileDiscovery/fd_grouping.png
The first option lets you choose the top level grouping for your results and the second option lets you choose how to sort them. The groups appear in the middle column of the file discovery panel. Note that some of the grouping options may not always appear - for example, grouping by past occurrences will only be present if the \ref central_repo_page is enabled, and grouping by hash set will only be present if there are hash set hits in your current case. The example below shows the groups created using the default options (group by file size, order groups by group name):
\image html FileDiscovery/fd_groupingSize.png
In the case of file size and past occurrences, ordering by group name is based on the natural ordering of the group (largest to smallest or most rare to most common). For the other groups it will be alphabetical. Ordering groups by size will sort them based on how many files each group contains, going largest to smallest. For example, here we've grouped by interesting item set and ordered the groups by their size.
\image html FileDiscovery/fd_groupingInt.png
The interesting items filter was not enabled, so most images ended up in the "None" group, meaning they have no interesting file result associated with them. The final group in the list contains a file that matched both interesting item rule sets.
The last grouping and sorting option is choosing how to sort the results within a group. This is the order of the results in the top right panel after selecting a group from the middle column. Note that due to the merging of results with the same hash in that panel, ordering by file name, path, or data source can vary. See the \ref file_disc_dedupe section below for more information.
\section file_disc_results Viewing Results
\subsection file_disc_dedupe De-duplication
*/