资源说明:
Note that this is a text-only and possibly out-of-date version of the wiki ReadMe, which is located at: http://code.google.com/p/tesseract-ocr/wiki/ReadMe Introduction ============ This package contains the Tesseract Open Source OCR Engine. Orignally developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado, all the code in this distribution is now licensed under the Apache License: ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** http://www.apache.org/licenses/LICENSE-2.0 ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. Other Dependencies and Licenses: ================================ The Aspirin/MIGRAINES system is no longer required. Tesseract can also make use of the libtiff library. (www.libtiff.org) See http://code.google.com/p/tesseract-ocr/wiki/FAQ for details. Without libtiff, Tesseract can only read uncompressed and G3 compressed TIFF files. Installing and Running Tesseract All Users Do NOT Ignore! The tarballs are split into pieces. tesseract-2.04.tar.gz contains all the source code. tesseract-2.00..tar.gz contains the language data files for . You need at least one of these or tesseract will not work. Note that tesseract-2.04.tar.gz unpacks to the tesseract-2.04 directory. tesseract-2.00. .tar.gz unpacks to the tessdata directory which belongs inside your tesseract-2.04 directory. It is therefore best to download them into your tesseract-2.04 directory, so you can use unpack here or equivalent. You can unpack as many of the language packs as you care to, as they all contain different files. Note that if you are using make install you should unpack your language data to your source tree before you run make install. If you unpack them as root to the destination directory of make install, then the user ids and access permissions might be messed up. boxtiff-2.01. .tar.gz contains data that was used in training for those that want to do their own training. Most users should NOT download these files. Instructions for using the training tools are documented separately at TrainingTesseract and for testing at TestingTesseract. Without Additional Libraries, Image format support is limited! Without additional libraries, Tesseract can only read uncompressed TIFF. (And some versions of BMP) Upto version 2.04, you can add libtiff-dev. See the FAQ question on compressed TIFF for installation instructions. Version 3.00 will support additional formats via Leptonica, but requires more libraries to be added. Windows: There is no windows installer! (Still looking for volunteers to create one.) There are windows executables: tesseract-2.04.exe.tar.gz (It is not for the 'exe' language.) They are built with VC++ express 2008 and come with absolutely no warranty. If they work for you then great, otherwise get Visual C++ Express 2008 with service pack 1 and build from the source. You can also try tesseract-2.01.exe.tar.gz, which is built with VC++6, and may work better if your windows is old, but note that this is an older version of Tesseract. If you are building from the sources, there are still (up to v2.04) .dsw and .dsp files for vc++6, but the recommended build platform is now VC++ Express 2008. There are also .sln and .vcproj files for VC++ Express 2008, but these files are not backward compatible with any previous version - not even VC++ Express 2005. Note that the executables produced with the newer compiler are smaller, faster, and, believe it or not, more accurate. (See TestingTesseract.) New with 2.04: the executables are built with static linking, so they stand more chance of working out of the box on more windows systems. The executable must reside in the same directory as the tessdata directory. (The Visual Studio projects build the release executable directly to the correct place!) The command line is: tesseract
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。