Host and run OCR as a service within your organisation or community.
OCR service is dependent on following:
- Java
- Maven
- Olena
- Tesseract
- Tessdata (for Indic scripts support)
- Varnam Project (libvarnam) Install instructions are here
Checkout the code
git clone https://github.com/indic-ocr/ocrservice.git
To compile and start the server use following command
mvn package && java -jar target/IndicOCR-jar-with-dependencies.jar <path_to_olena>/scribo/src/content_in_doc
On my local system it looks like this
mvn package && java -jar target/IndicOCR-jar-with-dependencies.jar ~/ocr/olena/olena/scribo/src/content_in_doc
The server start on port 8081 and exposes 3 webservice APIs
- /ocr which converts and image to an ODT file
- /india which converts an image to text using the scribo engine
- /indiastring which converts an image (uploaded, http url or data url) using tesseract or scribo and can also do invert or binarization of image before passing it to OCR engine
An experimental server is available on http://35.164.84.230:8081/. All images are removed from the server at least once a day and they are not stored
Usage Examples
/ocr
curl -F "dpi=300" -F "lang=eng" -F "myfile=@<path_to_image_file>" http://35.164.84.230:8081/ocr
/india
curl -F "tolang=eng" -F "sourcelang=pan" -F "myfile=@<path_to_binarized_image>" http://35.164.84.230:8081/india
/indiastring
curl -H "Content-Type: application/json" -X POST -d '{"filePath":"<http url or data url >", "sourcelang":"pan","tolang":"eng","operation":"invert","engine":"tesseract"}' http://35.164.84.230:8081/indiastring
- Allowed operations are normal, invert or binarize
- Allowed values for engine are tesseract or scribo
- All language parameters need to be 3 letter codes ( eg: eng for English, tam for Tamil)
Authors and Contributors
Help
Please join the project and help by code contributions or by reporting bugs.