View on GitHub

OCR Service

OCR as a service

Download this project as a .zip file Download this project as a tar.gz file

Indic-OCR Logo

Host and run OCR as a service within your organisation or community.

OCR service is dependent on following:

  1. Java
  2. Maven
  3. Olena
  4. Tesseract
  5. Tessdata (for Indic scripts support)
  6. Varnam Project (libvarnam) Install instructions are here

Checkout the code

git clone https://github.com/indic-ocr/ocrservice.git 

To compile and start the server use following command

mvn package  && java -jar target/IndicOCR-jar-with-dependencies.jar <path_to_olena>/scribo/src/content_in_doc

On my local system it looks like this

mvn package  && java -jar target/IndicOCR-jar-with-dependencies.jar ~/ocr/olena/olena/scribo/src/content_in_doc

The server start on port 8081 and exposes 3 webservice APIs

An experimental server is available on http://35.164.84.230:8081/. All images are removed from the server at least once a day and they are not stored

Usage Examples

/ocr

curl   -F "dpi=300"   -F "lang=eng"   -F "myfile=@<path_to_image_file>" http://35.164.84.230:8081/ocr

/india

curl   -F "tolang=eng"   -F "sourcelang=pan"   -F "myfile=@<path_to_binarized_image>" http://35.164.84.230:8081/india

/indiastring

curl -H "Content-Type: application/json" -X POST -d '{"filePath":"<http url or data url >", "sourcelang":"pan","tolang":"eng","operation":"invert","engine":"tesseract"}' http://35.164.84.230:8081/indiastring

Authors and Contributors

@rkvsraman

Help

Please join the project and help by code contributions or by reporting bugs.