View on GitHub


The Indic-OCR Project Site

Indic-OCR Logo


Indic-OCR is a collection of open source tools to enable OCRs in Indic Scripts.

Indic-OCR tools use Tesseract and Olena for layout detection.

Indic-OCR project provides a set of tesseract ocr models which have been trained using some special techniques customised for Indic Scripts. What we have here is perhaps one of the best tesseract models for Indic Scripts you will find in open source world. Get in touch with us if you want to train models for a particular font and we will be able to help you out.

So... what are these tools?

Tools And what they are
Tessdata Set of highly accurate Tesseract OCR models for Indic Scripts which include Ol Chiki (Santali) and Meetei Meyek (Manipuri) scripts too
Olena Perhaps the best set of tools available for layout detection in open source world.
OCR Service Host your own ocr service within your organisation or community (Batteries included)
LibreOCR Open an image in LibreOffice and convert it to editable document... just like that. Get the extension from here
Across India Nearly a Word Lens clone of Indian Languages, but it transliterates instead of translation. Watch this video Sideload Android apk from here
Indic Messenger Get transliterations of sign boards in Indian languages through a Facebook chat bot :-) . Watch this video . Add Indic OCR on your Facebook Messenger.
Indic OCR for Chrome A Project Naptha wannabe which currently support conversion of text in images on web pages to editable (copyable) form. Install on Chrome


OCR for Indian language is still a challenging field and Indic OCR project is an attempt to build a community of developers who can help solve some of the problems in way to achieve 100% accuracy.

Please join me in taking this project further and build more tools for Indian users.