View on GitHub

OCR Service

OCR as a service

Download this project as a .zip file Download this project as a tar.gz file

Host and run OCR as a service within your organisation or community.

OCR service is dependent on following:

Java
Maven
Olena
Tesseract
Tessdata (for Indic scripts support)
Varnam Project (libvarnam) Install instructions are here

Checkout the code

git clone https://github.com/indic-ocr/ocrservice.git

To compile and start the server use following command

mvn package  && java -jar target/IndicOCR-jar-with-dependencies.jar <path_to_olena>/scribo/src/content_in_doc

On my local system it looks like this

mvn package  && java -jar target/IndicOCR-jar-with-dependencies.jar ~/ocr/olena/olena/scribo/src/content_in_doc

The server start on port 8081 and exposes 3 webservice APIs

/ocr which converts and image to an ODT file
/india which converts an image to text using the scribo engine
/indiastring which converts an image (uploaded, http url or data url) using tesseract or scribo and can also do invert or binarization of image before passing it to OCR engine

An experimental server is available on http://35.164.84.230:8081/. All images are removed from the server at least once a day and they are not stored

Usage Examples

/ocr

curl   -F "dpi=300"   -F "lang=eng"   -F "myfile=@<path_to_image_file>" http://35.164.84.230:8081/ocr

/india

curl   -F "tolang=eng"   -F "sourcelang=pan"   -F "myfile=@<path_to_binarized_image>" http://35.164.84.230:8081/india

/indiastring

curl -H "Content-Type: application/json" -X POST -d '{"filePath":"<http url or data url >", "sourcelang":"pan","tolang":"eng","operation":"invert","engine":"tesseract"}' http://35.164.84.230:8081/indiastring

Allowed operations are normal, invert or binarize
Allowed values for engine are tesseract or scribo
All language parameters need to be 3 letter codes ( eg: eng for English, tam for Tamil)

Authors and Contributors

@rkvsraman

Help

Please join the project and help by code contributions or by reporting bugs.