mirror of
https://github.com/overcuriousity/autopsy-flatpak.git
synced 2025-07-06 21:00:22 +00:00
100 lines
6.8 KiB
Plaintext
100 lines
6.8 KiB
Plaintext
/*! \page ad_hoc_keyword_search_page Ad Hoc Keyword Search
|
|
|
|
|
|
\section ad_hoc_kw_overview Overview
|
|
|
|
The ad hoc keyword search features allows you to run single keyword terms or lists of keywords against all images in a case. Both options are located in the top right of the main Autopsy window.
|
|
|
|
\image html keyword-search-ad-hoc.PNG
|
|
|
|
The \ref keyword_search_page must be selected during ingest before doing an ad hoc keyword search. If you don't want to search for any of the existing keyword lists, you can deselect everything to just index the files for later searching.
|
|
|
|
\section ad_hoc_kw_types_section Creating Keywords
|
|
|
|
The following sections will give a description of each keyword type, then will show some sample text and how various search terms would work against it.
|
|
|
|
## Exact match
|
|
|
|
Exact match should be used in cases where the search term is expected to always be surrounded by non-word characters (typically whitespace or punctuation). Spaces/punctuation are allowed in the search term, and capitalization is ignored.
|
|
|
|
> The quick reddish-brown fox jumps over the lazy dog.
|
|
|
|
- "quick", "brown", "dog" will match
|
|
- "FOX", "Fox", "fox" will all match
|
|
- "reddish-brown fox", "brown fox", "LAZY DOG" will match
|
|
- "rown" and "lazy do" will not match since they are not bounded by non-word characters in the text
|
|
|
|
## Substring match
|
|
|
|
Substring match should be used where the search term is just part of a word, or to allow for different word endings. Capitalization is ignored but spaces and other punctuation can not appear in the search term.
|
|
|
|
> The quick reddish-brown fox jumps over the lazy dog.
|
|
|
|
- "jump" will match "jumps", and would also match "jumping", "jumped", etc.
|
|
- "dog" will match
|
|
- "UMP", "oX" will match
|
|
- "y dog", "ish-brown" will not match
|
|
|
|
## Regex match
|
|
|
|
Regex match can be used to search for a specific pattern. Regular expressions are supported using Lucene Regex Syntax which is documented here: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-regexp-query.html#regexp-syntax. Wildcards are automatically added to the beginning and end of the regular expressions to ensure all matches are found. Additionally, the resulting hits are split on common token separator boundaries (e.g. space, newline, colon, exclamation point etc.) to make the resulting keyword hit more amenable to highlighting.
|
|
|
|
There is some validation on the regex but it's best to test on a sample image to make sure your regexes are correct and working as expected. One simple way to test is by creating a sample text file that your expression should match, ingesting it as a \ref ds_log "Logical File Set" and then running the regex query.
|
|
|
|
> In the year 1885 in an article titled Current Notes, the quick brown fox first jumped over the lazy dog.
|
|
|
|
- "qu.ck", "Cu.*es" will match
|
|
- "[Ff][Oo][Xx]" will match any version of "fox". There is no way to specify that an entire regex should be case-insensitive.
|
|
- "[0-9]{4}" will match 1885. Character classes like "\d" are not supported. Backreferences are also not supported (but will not generate an error), so "Cu(.)\1ent" would not work to find "Current"
|
|
|
|
## Other notes
|
|
|
|
### Built-in keywords
|
|
|
|
The \ref keyword_search_page has several built-in searches that can not be edited. The ones that are most prone to false hits (IP Address and Phone Number) require that the matching text is surrounded by boundary characters, such as spaces or certain punctuation. For example:
|
|
- "Address 10.1.5.127 is unavailable" - The built-in IP Address search would find "10.1.5.127" because it is surrounded by whitespace
|
|
- "abc10.1.7.99xyz" - The built-in IP Address search would not find it because it is surrounded by letters
|
|
|
|
If you want to override this default behavior:
|
|
- Copy the existing regex. The easiest way to do this is to click on Keyword Lists, the list you want then the specific entry you want and hit control+c to copy. It will need a bit of cleanup afterward.
|
|
- Remove the boundary characters on the beginning and end of the regex
|
|
- Make a new keyword list containing the result and run it either during ingest or through the Keyword Lists button.
|
|
|
|
### Non-Latin text
|
|
In general all three types of keyword searches will work as expected but the feature has not been thoroughly tested with all character sets. As with regex above, we suggest testing on a sample file. Some notes:
|
|
- Exact match and substring match may no longer be case-insensitive
|
|
- In languages like Japanese that don't contain word breaks, every character is processed as a separate word. This tends to make substring match fail, but those searches can be run using exact match. For example, if the text contained 日本語, an exact match search on 日本 would find it (a substring search on 日本 would fail).
|
|
|
|
\section ad_hoc_kw_search Keyword Search
|
|
|
|
Individual keyword or regular expressions can quickly be searched using the search text box widget. You can select "Exact Match", "Substring Match" and "Regular Expression" match. See the earlier \ref ad_hoc_kw_types_section section for information on each keyword type.
|
|
|
|
\image html keyword-search-bar.PNG
|
|
|
|
Results will be opened in a separate Results Viewer for every search executed and they will also be saved in the Directory Tree as shown in the screenshot below.
|
|
|
|
\image html keyword-search-hits.PNG
|
|
|
|
\section ad_hoc_kw_lists Keyword Lists
|
|
|
|
In addition to being selected during ingest, keyword lists can also be run through the Keyword Lists button. For information on setting up these keyword lists, see the \ref keywordListsTab section of the ingest module documentation.
|
|
|
|
Lists created using the Keyword Search Configuration Dialog can be manually searched by the user by pressing on the 'Keyword Lists' button, selecting the check boxes corresponding to the lists to be searched, and pressing the 'Search' button.
|
|
|
|
\image html keyword-search-list.PNG
|
|
|
|
The results of the keyword list search are shown in the tree, as shown below.
|
|
|
|
\image html keyword-search-list-results.PNG
|
|
|
|
\section ad_hoc_during_ingest Doing ad hoc searches during ingest
|
|
|
|
Ad hoc searches are intended to be used after ingest completes, but can be used in a limited capacity while ingest is ongoing.
|
|
|
|
Manual \ref ad_hoc_kw_search for individual keywords or regular expressions can be executed while ingest is ongoing, using the current index. Note however, that you may miss some results if the entire index has not yet been populated. Autopsy enables you to perform the search on an incomplete index in order to retrieve some preliminary results in real-time.
|
|
|
|
During the ingest, the normal manual search using \ref ad_hoc_kw_lists behaves differently than after ingest is complete. A selected list can instead be added to the ingest process and it will be searched in the background instead.
|
|
|
|
Most keyword management features are disabled during ingest. You can not edit keyword lists but can create new lists (but not add to them) and copy and export existing lists.
|
|
|
|
*/ |