Avoided the need to install Tesseract

This commit is contained in:
Eugene Livis 2018-05-04 15:05:07 -04:00
parent 4f6cbe910c
commit e1aad0f798
103 changed files with 641811 additions and 0 deletions

View File

@ -39,6 +39,11 @@
<copy todir="${basedir}/release/Volatility" > <copy todir="${basedir}/release/Volatility" >
<fileset dir="${thirdparty.dir}/Volatility"/> <fileset dir="${thirdparty.dir}/Volatility"/>
</copy> </copy>
<!--Copy Tesseract OCR to release-->
<copy todir="${basedir}/release/Tesseract-OCR" >
<fileset dir="${thirdparty.dir}/Tesseract-OCR"/>
</copy>
<!--Copy other jars--> <!--Copy other jars-->
<copy file="${thirdparty.dir}/rejistry/Rejistry-1.0-SNAPSHOT.jar" todir="${ext.dir}" /> <copy file="${thirdparty.dir}/rejistry/Rejistry-1.0-SNAPSHOT.jar" todir="${ext.dir}" />

BIN
thirdparty/Tesseract-OCR/ambiguous_words.exe vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/classifier_tester.exe vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/cntraining.exe vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/combine_tessdata.exe vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/dawg2wordlist.exe vendored Executable file

Binary file not shown.

42
thirdparty/Tesseract-OCR/doc/AUTHORS vendored Executable file
View File

@ -0,0 +1,42 @@
Ray Smith (lead developer) <theraysmith@gmail.com>
Ahmad Abdulkader
Rika Antonova
Nicholas Beato
Jeff Breidenbach
Samuel Charron
Phil Cheatle
Simon Crouch
David Eger
Sheelagh Huddleston
Dan Johnson
Rajesh Katikam
Thomas Kielbus
Dar-Shyang Lee
Zongyi (Joe) Liu
Robert Moss
Chris Newton
Michael Reimer
Marius Renn
Raquel Romano
Christy Russon
Shobhit Saxena
Mark Seaman
Faisal Shafait
Hiroshi Takenaka
Ranjith Unnikrishnan
Joern Wanke
Ping Ping Xiu
Andrew Ziem
Oscar Zuniga
Community Contributors:
Zdenko Podobný (Maintainer)
Jim Regan (Maintainer)
James R Barlow
Amit Dovev
Martin Ettl
Tom Morris
Tobias Müller
Egor Pugin
Sundar M. Vaidya
Stefan Weil

21
thirdparty/Tesseract-OCR/doc/COPYING vendored Executable file
View File

@ -0,0 +1,21 @@
This package contains the Tesseract Open Source OCR Engine.
Originally developed at Hewlett Packard Laboratories Bristol and
at Hewlett Packard Co, Greeley Colorado, all the code
in this distribution is now licensed under the Apache License:
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
Other Dependencies and Licenses:
================================
Tesseract uses Leptonica library (http://leptonica.com/) which essentially
uses a BSD 2-clause license. (http://leptonica.com/about-the-license.html)

43
thirdparty/Tesseract-OCR/doc/README vendored Executable file
View File

@ -0,0 +1,43 @@
How to run UNLV tests.
The scripts in this directory make it possible to duplicate the tests
published in the Fourth Annual Test of OCR Accuracy.
See http://www.isri.unlv.edu/downloads/AT-1995.pdf
but first you have to get the tools and data from UNLV:
Step 1: to download the images goto
http://www.isri.unlv.edu/ISRI/OCRtk
and get 3b.tgz, Bb.tgz, Mb.tgz and Nb.tgz.
Step 2: extract the files. It doesn't really matter where
in your filesystem you put them, but they must go under a common
root so you have directories 3, B, M and N in, for example,
/users/me/ISRI-OCRtk.
Step 3: Reorg the files
The lack of tif extensions on the images is inconvenient, so there
is a script to reorganize the data to match the rest of the test
scripts.
cd to /users/me/ISRI-OCRtk or wherever 3, B, M and N ended up and run
/blah/blah/tesseract-ocr/testing/reorgdata.sh 3B
This makes directories doe3.3B, bus.3B, mag.3B and news.3B.
You can now get rid of 3, B, M, and N unless you want to get some of the
other scanning resolutions out of them.
Step 4: Download the ISRI toolkit from:
http://www.isri.unlv.edu/downloads/ftk-1.0.tgz
Step 5: If they work for you, use the binaries directly from the bin
directory and put them in tesseract-ocr/testing/unlv
otherwise build the tools for yourself and put them there.
Step 6: cd back to your main tesseract-ocr dir and Build tesseract.
Step 7: run testing/runalltests.sh with the root data dir and testname:
testing/runalltests.sh /users/me/ISRI-OCRtk tess2.0
and go to the gym, have lunch etc.
Step 8: There should be a file
testing/reports/tess2.0.summary that contains the final summarized accuracy
report and comparison with the 1995 results.

BIN
thirdparty/Tesseract-OCR/doc/eurotext.tif vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/doc/phototest.tif vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/iconv.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/icudata51.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/icui18n51.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/icuuc51.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/java/ScrollView.jar vendored Executable file

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libbz2-1.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libcairo-2.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libexpat-1.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libffi-6.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libfontconfig-1.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libfreetype-6.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libgcc_s_sjlj-1.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libgif-4.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libglib-2.0-0.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libgobject-2.0-0.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libgomp-1.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libharfbuzz-0.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libintl-8.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libjbig-2.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libjpeg-8.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/liblept-5.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/liblzma-5.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libopenjp2.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libpango-1.0-0.dll vendored Executable file

Binary file not shown.

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libpangoft2-1.0-0.dll vendored Executable file

Binary file not shown.

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libpixman-1-0.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libpng16-16.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libstdc++-6.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libtesseract-3.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libtiff-5.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libwebp-5.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/libwinpthread-1.dll vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/mftraining.exe vendored Executable file

Binary file not shown.

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/shapeclustering.exe vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/tar.exe vendored Executable file

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1 @@
.٠

View File

@ -0,0 +1,7 @@
LeadPunc="({[`'«
TrailPunc=}:;-]!?`,.)"'»،؛؟
NumLeadPunc=#({[@$
NumTrailPunc=}):;].,%،؛٪
Operators=*+-/.:,()[]،؛
Digits=٠١٢٣٤٥٦٧٨٩0123456789
Alphas=ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي

BIN
thirdparty/Tesseract-OCR/tessdata/ara.cube.nn vendored Executable file

Binary file not shown.

View File

@ -0,0 +1,14 @@
RecoWgt=0.8354
SizeWgt=0.05
OODWgt=0.0331
NumWgt=-0.0626
CharBigramsWgt=-0.0643
MaxSegPerChar=10
BeamWidth=12
ConvGridSize=48
HistWindWid=0
WordUnigramsWgt=0.0100
MaxWordAspectRatio=5.0
MinSpaceHeightRatio=0.15
MaxSpaceHeightRatio=0.4
MinConCompSize=5

194342
thirdparty/Tesseract-OCR/tessdata/ara.cube.size vendored Executable file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@ -0,0 +1,7 @@
tessedit_ambigs_training 1
load_freq_dawg 0
load_punc_dawg 0
load_system_dawg 0
load_number_dawg 0
ambigs_debug_level 3
load_fixed_length_dawgs 0

View File

@ -0,0 +1 @@
tessedit_zero_rejection T

View File

@ -0,0 +1,5 @@
load_bigram_dawg True
tessedit_enable_bigram_correction True
tessedit_bigram_debug 3
save_raw_choices True
save_alt_choices True

View File

@ -0,0 +1,14 @@
disable_character_fragments T
file_type .bl
textord_fast_pitch_test T
tessedit_single_match 0
tessedit_zero_rejection T
tessedit_minimal_rejection F
tessedit_write_rep_codes F
il1_adaption_test 1
edges_children_fix F
edges_childarea 0.65
edges_boxarea 0.9
tessedit_resegment_from_boxes T
tessedit_train_from_boxes T
textord_no_rejects T

View File

@ -0,0 +1,15 @@
file_type .bl
#tessedit_use_nn F
textord_fast_pitch_test T
tessedit_single_match 0
tessedit_zero_rejection T
tessedit_minimal_rejection F
tessedit_write_rep_codes F
il1_adaption_test 1
edges_children_fix F
edges_childarea 0.65
edges_boxarea 0.9
tessedit_resegment_from_boxes T
tessedit_train_from_boxes T
#textord_repeat_extraction F
textord_no_rejects T

View File

@ -0,0 +1 @@
tessedit_char_whitelist 0123456789-.

View File

@ -0,0 +1,3 @@
tessedit_create_hocr 1
tessedit_pageseg_mode 1
hocr_font_info 0

View File

@ -0,0 +1,2 @@
interactive_display_mode T
tessedit_display_outwords T

View File

@ -0,0 +1,4 @@
textord_skewsmooth_offset 8
textord_skewsmooth_offset2 8
textord_merge_desc 0.5
textord_no_rejects 1

View File

@ -0,0 +1,2 @@
tessedit_resegment_from_line_boxes 1
tessedit_make_boxes_from_boxes 1

View File

@ -0,0 +1 @@
debug_file tesseract.log

View File

@ -0,0 +1 @@
tessedit_create_boxfile 1

View File

@ -0,0 +1,2 @@
tessedit_create_pdf 1
tessedit_pageseg_mode 1

View File

@ -0,0 +1 @@
debug_file /dev/null

View File

@ -0,0 +1,2 @@
tessedit_resegment_from_boxes 1
tessedit_make_boxes_from_boxes 1

View File

@ -0,0 +1,12 @@
textord_show_blobs 0
textord_debug_tabfind 3
textord_tabfind_show_partitions 1
textord_tabfind_show_initial_partitions 1
textord_tabfind_show_columns 1
textord_tabfind_show_blocks 1
textord_tabfind_show_initialtabs 1
textord_tabfind_show_finaltabs 1
textord_tabfind_show_strokewidths 1
textord_tabfind_show_vlines 0
textord_tabfind_show_images 1
tessedit_dump_pageseg_images 0

View File

@ -0,0 +1,2 @@
tessedit_create_tsv 1
tessedit_pageseg_mode 1

View File

@ -0,0 +1,3 @@
# This config file should be used with other cofig files which creates renderers.
# usage example: tesseract eurotext.tif eurotext txt hocr pdf
tessedit_create_txt 1

View File

@ -0,0 +1,2 @@
tessedit_write_unlv 1
tessedit_pageseg_mode 6

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,12 @@
0oO
lI1
cC
kK
pP
sS
uU
vV
wW
xX
yY
zZ

View File

@ -0,0 +1,7 @@
LeadPunc="({[`'
TrailPunc=}:;-]!?`,.)"'
NumLeadPunc=#({[@$
NumTrailPunc=}):;].,%
Operators=*+-/.:,()[]
Digits=0123456789
Alphas=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

BIN
thirdparty/Tesseract-OCR/tessdata/eng.cube.nn vendored Executable file

Binary file not shown.

View File

@ -0,0 +1,14 @@
RecoWgt=1.0
SizeWgt=0.2435
OODWgt=0.0214
NumWgt=0.036
CharBigramsWgt=0.1567
MaxSegPerChar=8
BeamWidth=10
ConvGridSize=48
WordUnigramsWgt=0.01
MaxWordAspectRatio=20.0000
MinSpaceHeightRatio=0.5000
MaxSpaceHeightRatio=0.6000
HistWindWid=2
MinConCompSize=0

194633
thirdparty/Tesseract-OCR/tessdata/eng.cube.size vendored Executable file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,2 @@
1-\d\d\d-GOOG-411
www.\n\\\*.com

View File

@ -0,0 +1,5 @@
the
quick
brown
fox
jumped

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/tessdata/pdf.ttf vendored Executable file

Binary file not shown.

View File

@ -0,0 +1,2 @@
# No content needed as all defaults are correct.

View File

@ -0,0 +1,2 @@
chop_enable 0
wordrec_enable_assoc 0

View File

@ -0,0 +1,7 @@
#################################################
# Adaptive Matcher Using PreAdapted Templates
#################################################
classify_enable_adaptive_debugger 1
matcher_debug_flags 6
matcher_debug_level 1

View File

@ -0,0 +1,13 @@
#################################################
# Adaptive Matcher Using PreAdapted Templates
#################################################
classify_enable_adaptive_debugger 1
matcher_debug_flags 6
matcher_debug_level 1
wordrec_display_splits 0
wordrec_display_all_words 1
wordrec_display_all_blobs 1
wordrec_display_segmentations 2
classify_debug_level 1

View File

@ -0,0 +1 @@

View File

@ -0,0 +1,10 @@
#################################################
# Adaptive Matcher Using PreAdapted Templates
#################################################
wordrec_display_splits 0
wordrec_display_all_words 1
wordrec_display_all_blobs 1
wordrec_display_segmentations 2
classify_debug_level 1
stopper_debug_level 1

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/tesseract.exe vendored Executable file

Binary file not shown.

BIN
thirdparty/Tesseract-OCR/text2image.exe vendored Executable file

Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More