Richard Cordovano
|
aa4474d54d
|
Change TikaTextExtractor static init parallelStream use to stream
|
2017-01-11 12:31:15 -05:00 |
|
millmanorama
|
860f4361b3
|
Merge branch '2132-32k-chunks' into 2184-overlapping-chunks
# Conflicts:
# KeywordSearch/src/org/sleuthkit/autopsy/keywordsearch/Ingester.java
|
2017-01-09 11:51:36 +01:00 |
|
millmanorama
|
a8cfbd1e10
|
refactor TextExtractor to an interface and to remove the intermediary getInputStream() method
|
2017-01-09 11:07:12 +01:00 |
|
millmanorama
|
7325174dc3
|
Merge remote-tracking branch 'upstream/develop' into 2132-32k-chunks
|
2017-01-09 10:47:59 +01:00 |
|
Richard Cordovano
|
837eb1477f
|
Pull in text extraction refactoring and resolve merge conflicts
|
2017-01-08 10:22:18 -05:00 |
|
Richard Cordovano
|
5463d3a719
|
Remove kws public Server.getIngester, Ingester is not public
|
2017-01-07 10:30:58 -05:00 |
|
millmanorama
|
161ba2098c
|
cleanup and comments in Chunker
|
2017-01-06 14:53:58 +01:00 |
|
millmanorama
|
990433fc36
|
refactor Chunker read methods to use a common helper method.
|
2017-01-06 13:16:40 +01:00 |
|
millmanorama
|
64ba5f6e66
|
Merge remote-tracking branch 'upstream/develop' into 2184-overlapping-chunks
|
2017-01-06 11:09:27 +01:00 |
|
millmanorama
|
52251bcb2e
|
move Reader reset back to beginning of next() and increase buffer size to 2048.
|
2017-01-06 00:03:45 +01:00 |
|
millmanorama
|
5e0f9abdf9
|
reset at end to avoid "This stream has not been marked" error.
|
2017-01-04 17:22:56 +01:00 |
|
millmanorama
|
151742c21b
|
record length in chars and mark/reset reader to produce overlaps
|
2017-01-04 17:16:20 +01:00 |
|
millmanorama
|
d8ec4290f2
|
reduce max window size to prevent off by one error
|
2017-01-04 17:16:19 +01:00 |
|
millmanorama
|
94e136b451
|
first pass at overlapping chunks
|
2017-01-04 17:16:17 +01:00 |
|
millmanorama
|
d14c15fbdb
|
bump chunk size to exactly 32k, single read chars to 1024
|
2017-01-04 12:25:34 +01:00 |
|
millmanorama
|
8410970b11
|
Chunker implements Iterator and Iterable
|
2017-01-03 14:57:55 +01:00 |
|
millmanorama
|
15c2d395fa
|
move Chunk and Chunker out of Ingester
|
2017-01-03 14:26:48 +01:00 |
|
millmanorama
|
d2a6fe3fda
|
move chunking algorithm into seperate class(es) and reduce chunk size to ~32k
|
2017-01-03 14:26:46 +01:00 |
|
Richard Cordovano
|
46369eff44
|
Update NBM versioning for 4.3.0
|
2017-01-02 18:45:21 -05:00 |
|
Richard Cordovano
|
13411450aa
|
4.3.0 preps: DSPs, public API restore, const name
|
2017-01-02 17:36:59 -05:00 |
|
millmanorama
|
3557f141e1
|
use UTF-8 encoding for ArtifactTextExtractor streams and readers
|
2017-01-02 16:45:51 +01:00 |
|
millmanorama
|
4ae0a688bc
|
don't commit unnecessarily
|
2016-12-31 14:31:11 +01:00 |
|
millmanorama
|
8526427b4f
|
cleanup and comment TextExtractor
cleanup and comment TextExtractor immplementations more.
remove constants left over from merge
|
2016-12-28 17:30:42 +01:00 |
|
millmanorama
|
f56c2b43c8
|
move all 'appendix' related code into TikaTextExtractor and simplify TextExtractor interface.
|
2016-12-28 17:30:32 +01:00 |
|
millmanorama
|
8841f6e773
|
minor fixes
|
2016-12-28 17:30:30 +01:00 |
|
millmanorama
|
2d5cd2efc1
|
comment up Ingester
|
2016-12-28 17:30:27 +01:00 |
|
millmanorama
|
c94d3de872
|
move encoding options to StringsTextExtractor
|
2016-12-28 17:30:25 +01:00 |
|
millmanorama
|
9b85284194
|
remove unused outerclasses that have copies as innerclasses
|
2016-12-28 17:30:23 +01:00 |
|
millmanorama
|
c42f687bfb
|
more cleanup
more cleanup
|
2016-12-28 17:30:15 +01:00 |
|
millmanorama
|
b904c37dd2
|
remove more unneeded ContentStreams and cleanup logging
|
2016-12-28 15:03:45 +01:00 |
|
millmanorama
|
0303c96d41
|
cleanup Ingester.indexChunk
|
2016-12-28 15:03:04 +01:00 |
|
millmanorama
|
abf21f58ee
|
remove obsolete and unused ContentStreams
|
2016-12-28 15:03:03 +01:00 |
|
millmanorama
|
2b4bb33798
|
cleanup up ArtifactExtractor; reduce use of ContentStream
|
2016-12-28 15:03:01 +01:00 |
|
millmanorama
|
697a7d7a58
|
reduce method overloads for indexing artifacts
|
2016-12-28 15:02:59 +01:00 |
|
millmanorama
|
b38171dbd7
|
make the ByteXXXStream classes inner classes of the TextExtractors that use them.
|
2016-12-28 15:02:58 +01:00 |
|
millmanorama
|
85af7c57b6
|
build out ArtifactExtractor
|
2016-12-28 15:02:56 +01:00 |
|
millmanorama
|
1a70a4e8b2
|
introduce ArtifactExtractor
|
2016-12-28 15:02:39 +01:00 |
|
millmanorama
|
359dc16ee5
|
inline indexChunk
|
2016-12-28 15:02:23 +01:00 |
|
millmanorama
|
c9795cabcb
|
pull up methods from TextExtractorBase into TextExtractor.java
|
2016-12-28 15:02:21 +01:00 |
|
millmanorama
|
0f1f8b2211
|
refactor common chunking algorithm into TextExtractorBase, remove AbstractFileChunk
|
2016-12-28 15:02:18 +01:00 |
|
Richard Cordovano
|
a5902d50f5
|
Correctly handle CancellationException in KeywordSearchResultFactory.BlackboardResultWriter
|
2016-12-19 17:27:42 -05:00 |
|
Eugene Livis
|
d1616cdeb6
|
Fixed a very misleading error mesage
|
2016-12-14 09:56:25 -05:00 |
|
Richard Cordovano
|
bb1975b9c4
|
Merge pull request #2428 from zhhl/2123-sortSolrResultToKeepConsistantKeywordPreview
2123: Sort the Solr results to keep KeywordSearch Preview pick up the…
|
2016-12-14 09:51:08 -05:00 |
|
U-BASIS\zhaohui
|
2711788582
|
2123: correction
|
2016-12-13 17:42:02 -05:00 |
|
U-BASIS\zhaohui
|
05a6fa8d37
|
2123: clean up
|
2016-12-13 17:38:22 -05:00 |
|
U-BASIS\zhaohui
|
8a1f272738
|
2123: let Solr do ascending sorting to let us have a consistant result
|
2016-12-13 17:33:41 -05:00 |
|
U-BASIS\zhaohui
|
4a0202cea9
|
2123: Sort the Solr results to keep KeywordSearch Preview pick up the same result each time
|
2016-12-11 09:56:57 -05:00 |
|
Ann Priestman
|
231e87187d
|
Add dialog to allow the user to add multiple keywords at a time.
|
2016-12-08 09:58:31 -05:00 |
|
esaunders
|
a782e52f80
|
Removed filterOneHitPerDocument() since (a) it's use prevents the display of hits across multiple pages/chunks and (b) QueryResults.writeAllHitsToBlackBoard() takes care of ensuring that only a single blackboard artifact is created per document.
|
2016-12-07 16:17:24 -05:00 |
|
esaunders
|
83f8d575e9
|
Add quotes around the keyword when the search results are not available to make highlighting work correctly.
|
2016-12-07 16:14:00 -05:00 |
|