Tuesday, April 28, 2009

A first glance at tesseract-ocr

So, I start to look into the name card OCR project. As suggested by Alex, I'd first look at the OCR engine developed by Google, tesseract-ocr.

From wikipedia:
Tesseract is a free optical character recognition engine. It was originally developed as proprietary software at Hewlett-Packard between 1985 until 1995. After ten years without any development taking place, Hewlett Packard and UNLV released it as open source in 2005. Tesseract is currently developed by Google and released under the Apache License, Version 2.0

It's a little strange that the featured tar balls listed on the project home page have compile errors. After modifying the source code, I successfully got the program binaries, but the language charsets aren't built. Then I change to the svn HEAD version, and it works.

To use tesseract, I simply type:
[bergwolf@bin]$./tesseract phototest.tif result
Tesseract Open Source OCR Engine
[bergwolf@bin]$cat result.txt
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.
The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.


Tesseract OCR engine is very accurate, and is very suitable for our name card OCR service, because usually we only have white background and black letters in our images.

However, the drawback is that it doesn't support many image formats. Most mobile device save camera photos in jpeg format. Currently, only tiff and bmp formats are recognizable by tesseract. If we want to use it as our OCR engine, two options are available: either patch tesseract with other image formats support, or use other tools like imagemagick to convert other image formats to tiff or bmp format, both of which shouldn't be very hard.

Saturday, April 18, 2009

Being one of opensource apprentices

A few days ago, I applied to be one of Alex Lau's 12 open source apprentices. After several conversations via email with Alex, I finally got a chance to see him face to face.

It was a wonderful and pleasant talk. Alex gave me a lot of advices, on my tech path and on my future plans. We talked about many open source projects, like name card OCR, SyncML, Ifolder, Logfs(and FTL), as well as p2p file systems. At this point, I'm planning to first look at the on-line name card OCR project. It is most related to the Maemo project in my laboratory, and maybe, I can find someone in the lab to implement it together with me.

My problem is that I applied for this year's GSoC program. But the GSoC slots allocation is not decided yet. So I don't know how much time I can devote to this program. Nevertheless, let's see the GSoC results first, which is due soon.