Just a quick post about tesseract, a quite good solution for OCR under GNU/Linux (specifically Ubuntu Karmic Koala).
First install it trough apt-get
sudo apt-get tesseract-ocr
Install also your preferred language (in my case eng -> tesseract-ocr-eng and ita -> tesseract-ocr-ita).
Ok, we are ready to do some text recognition…
But, under Karmic Koala, there is a problem with tif image as reported by myself here: https://bugs.launchpad.net/ubuntu/+source/tesseract/+bug/461177
The problem is due to a transparent alpha layer that some tif images have (investigation needed here…*), so before do text recognition is necessary to eliminate it, elsewhere tesseract will generate an empty file…
Just install imagemagick and from a shell do this steps:
convert inputimage.tif inputimage_tmp.pbm
convert inputimage_tmp.pbm inputimage_ok.tif
Original solution founded here.
Now we are finally ready to launch tesseract on our tif image.
Just do
tesseract inputimage_ok.tif outputfile
and tesseract will generate outputfile.txt with recognized text.
ps. The packaged version under karmic is 2.03 and not the last one, 2.04, that, as advised on this page, fixed it. So if you prefer, remove old version and install the new version from source.
* Solved. In Gimp is possible to remove the alpha layer. Just go to
Layer (Livello) -> Transparency (Trasparenza) -> Remove alpha layer (Rimuovi canale alpha)

It is still in development but it is a great intitiative of 

In the past few days i’ve spent my time at work setting up a Linux server with Oracle database. Since i have had some problems i have decided to write how to install it without any trouble.