esaunders
|
ff85d494ed
|
Tightened up IP address regex to not match hits that have more than 4 elements separated by dots. Removed trailing slash from URL regex as it is not needed...the trailing slash is being handled as one of the boundary characters in RegexQuery
|
2017-01-09 16:39:53 -05:00 |
|
Eugene Livis
|
09014d34b6
|
Registering as service providers
|
2017-01-09 15:58:50 -05:00 |
|
Eugene Livis
|
e3ed9dfc34
|
Resolved merge conflicts
|
2017-01-09 14:08:14 -05:00 |
|
Eugene Livis
|
7e291b1124
|
<erge latest
|
2017-01-09 10:37:35 -05:00 |
|
Eugene Livis
|
75441946a1
|
Minor
|
2017-01-09 10:23:03 -05:00 |
|
millmanorama
|
860f4361b3
|
Merge branch '2132-32k-chunks' into 2184-overlapping-chunks
# Conflicts:
# KeywordSearch/src/org/sleuthkit/autopsy/keywordsearch/Ingester.java
|
2017-01-09 11:51:36 +01:00 |
|
millmanorama
|
a8cfbd1e10
|
refactor TextExtractor to an interface and to remove the intermediary getInputStream() method
|
2017-01-09 11:07:12 +01:00 |
|
millmanorama
|
7325174dc3
|
Merge remote-tracking branch 'upstream/develop' into 2132-32k-chunks
|
2017-01-09 10:47:59 +01:00 |
|
Richard Cordovano
|
be7bdced90
|
Merge in develop branch with text extraction refactoring
|
2017-01-08 10:48:17 -05:00 |
|
Richard Cordovano
|
837eb1477f
|
Pull in text extraction refactoring and resolve merge conflicts
|
2017-01-08 10:22:18 -05:00 |
|
Richard Cordovano
|
b0ce3168df
|
Merge pull request #2434 from millmanorama/fix-compiler-warnings
fix compiler warnings about raw types
|
2017-01-07 11:10:24 -05:00 |
|
Richard Cordovano
|
8fbb19a67d
|
Merge remote-tracking branch 'upstream/develop' into search_improvements
|
2017-01-07 10:32:34 -05:00 |
|
Richard Cordovano
|
5463d3a719
|
Remove kws public Server.getIngester, Ingester is not public
|
2017-01-07 10:30:58 -05:00 |
|
Eugene Livis
|
4299a2326e
|
More work
|
2017-01-06 16:22:49 -05:00 |
|
Eugene Livis
|
b5e3639167
|
Fixing comments
|
2017-01-06 16:18:45 -05:00 |
|
Eugene Livis
|
d23b78f57c
|
Fixing comments
|
2017-01-06 16:17:42 -05:00 |
|
Eugene Livis
|
bb0c3e55eb
|
Fixing comments
|
2017-01-06 16:14:53 -05:00 |
|
Eugene Livis
|
21f2efbdcf
|
More work
|
2017-01-06 16:05:11 -05:00 |
|
Eugene Livis
|
b05dded08a
|
Got inex folder search algorithm to work
|
2017-01-06 15:48:44 -05:00 |
|
millmanorama
|
161ba2098c
|
cleanup and comments in Chunker
|
2017-01-06 14:53:58 +01:00 |
|
millmanorama
|
990433fc36
|
refactor Chunker read methods to use a common helper method.
|
2017-01-06 13:16:40 +01:00 |
|
millmanorama
|
64ba5f6e66
|
Merge remote-tracking branch 'upstream/develop' into 2184-overlapping-chunks
|
2017-01-06 11:09:27 +01:00 |
|
millmanorama
|
52251bcb2e
|
move Reader reset back to beginning of next() and increase buffer size to 2048.
|
2017-01-06 00:03:45 +01:00 |
|
Eugene Livis
|
40cc726a11
|
First cut at integrating AutopsyServiceProvider
|
2017-01-05 17:16:27 -05:00 |
|
Eugene Livis
|
7d252864a4
|
Index folder finding algorithm seems to work
|
2017-01-05 13:41:12 -05:00 |
|
Eugene Livis
|
4555f7d44d
|
Merge branch 'search_improvements' of https://github.com/sleuthkit/autopsy into solr65
|
2017-01-05 12:42:55 -05:00 |
|
Richard Cordovano
|
210068e241
|
Merge in develop branch
|
2017-01-05 10:24:59 -05:00 |
|
esaunders
|
acae764760
|
Modified the phone number regex to pick up number that have spaces in them...perhaps this will produce more false positives but in our test data it produces over 1,000 extra numbers that are not found in Autopsy 4.2. Also updated the email regex to find email addresses surrounded in {} sometimes seen in academic publications.
|
2017-01-04 17:14:06 -05:00 |
|
esaunders
|
ba7f8ab9b3
|
Consolidated boundary characters into a single list.
|
2017-01-04 15:19:21 -05:00 |
|
esaunders
|
c172e0f16e
|
Fix for missing characters in snippets and reduce length of snippets in an attempt to more closely match previous version of Autopsy.
|
2017-01-04 12:23:42 -05:00 |
|
esaunders
|
8432fec205
|
Updated email and url regexes to be case insensitive.
|
2017-01-04 12:22:17 -05:00 |
|
millmanorama
|
5e0f9abdf9
|
reset at end to avoid "This stream has not been marked" error.
|
2017-01-04 17:22:56 +01:00 |
|
millmanorama
|
151742c21b
|
record length in chars and mark/reset reader to produce overlaps
|
2017-01-04 17:16:20 +01:00 |
|
millmanorama
|
d8ec4290f2
|
reduce max window size to prevent off by one error
|
2017-01-04 17:16:19 +01:00 |
|
millmanorama
|
94e136b451
|
first pass at overlapping chunks
|
2017-01-04 17:16:17 +01:00 |
|
millmanorama
|
d14c15fbdb
|
bump chunk size to exactly 32k, single read chars to 1024
|
2017-01-04 12:25:34 +01:00 |
|
Eugene Livis
|
62ad3e1eb2
|
First cut of index search algorithm
|
2017-01-03 16:51:57 -05:00 |
|
esaunders
|
6304300f62
|
Merge branch 'develop' of github.com:sleuthkit/autopsy into 2121_regex_query
|
2017-01-03 12:56:04 -05:00 |
|
esaunders
|
45c2b0c065
|
Set results max page size to 512.
|
2017-01-03 12:48:39 -05:00 |
|
esaunders
|
c1f326775a
|
Added result paging support.
|
2017-01-03 12:47:16 -05:00 |
|
millmanorama
|
8410970b11
|
Chunker implements Iterator and Iterable
|
2017-01-03 14:57:55 +01:00 |
|
millmanorama
|
15c2d395fa
|
move Chunk and Chunker out of Ingester
|
2017-01-03 14:26:48 +01:00 |
|
millmanorama
|
d2a6fe3fda
|
move chunking algorithm into seperate class(es) and reduce chunk size to ~32k
|
2017-01-03 14:26:46 +01:00 |
|
Richard Cordovano
|
46369eff44
|
Update NBM versioning for 4.3.0
|
2017-01-02 18:45:21 -05:00 |
|
Richard Cordovano
|
13411450aa
|
4.3.0 preps: DSPs, public API restore, const name
|
2017-01-02 17:36:59 -05:00 |
|
millmanorama
|
3557f141e1
|
use UTF-8 encoding for ArtifactTextExtractor streams and readers
|
2017-01-02 16:45:51 +01:00 |
|
millmanorama
|
4ae0a688bc
|
don't commit unnecessarily
|
2016-12-31 14:31:11 +01:00 |
|
esaunders
|
681699467d
|
Needed to tweak the CC regex and our boundary characters to successfully match CC numbers in our test data set.
|
2016-12-28 14:37:51 -05:00 |
|
millmanorama
|
8526427b4f
|
cleanup and comment TextExtractor
cleanup and comment TextExtractor immplementations more.
remove constants left over from merge
|
2016-12-28 17:30:42 +01:00 |
|
millmanorama
|
f56c2b43c8
|
move all 'appendix' related code into TikaTextExtractor and simplify TextExtractor interface.
|
2016-12-28 17:30:32 +01:00 |
|