SEARCH

How-To Geek

Extract Text from Images: 10 OCR Tool Compared

Optical Character Recognition (OCR) is an amazing time saver when it works well and a huge time sink when it malfunctions. Check out this comparison of 10 OCR tools to find one best suited for your project.

Freeware Genius pitted 5 web-based OCR services and 5 desktop OCR apps against each other. All the services/apps are either free or have a trial/free for home use component. Their reviews include the license, system requirements, input file type, output filetype, dictionary languages, and a Pro/Con section highlighting the best and worst of each tool.

Hit up the link below to check out all the reviews plus some additional tips on improving your OCR scans.

How To Extract Text From Images: A Comparison of 10 Free OCR Tools [Freeware Genius]

Jason Fitzpatrick is warranty-voiding DIYer and all around geek. When he's not documenting mods and hacks he's doing his best to make sure a generation of college students graduate knowing they should put their pants on one leg at a time and go on to greatness, just like Bruce Dickinson. You can follow him on if you'd like.

  • Published 11/2/11

Comments (6)

  1. Wayne

    Cureently use FreeOCR to convert confidential TIFF files to text for further formatting and processing. Works well enough for me and allows you to select around images.

  2. BigT

    I tend to shy away from OCR altogether. The problem is that for a small amount of text it’s impractical and for large amounts of text it’s time consuming since you will in most cases have to comb through what you’ve scanned for errors anyway it’s literally a matter of checking your p’s and q’s.

    NOw most OCR desktop software can be trained but once again this is time consuming and typically the program has to be retrained when you change the source details i.e the document is in a different font family.

    It only becomes practical for funnily enough very large documents like say on the order of 20 pages and up…you’ll still have to comb through but the time spent combing vs the time spent to actually type and proof the document is more favourable.

  3. Gregg DesElms

    I agree with BigT… but only to a point. It depends, in large measure, on both the quality of the OCR software, and the quality of the text being scanned. If both are sufficiently good, then the number of errors tends to be very small, even in a very large document.

    As to the quality of the document, the text must be black on a white background; it must be clear and crisp; and the font matters. If ever the old computer maxim “garbage in, garbage out” applied to anything, it’s to OCR. If the document being OCRed isn’t of high quality, then there will be lots of errors.

    As to the quality of the OCR software, there’s a lot of it out there that’s not very good… really bad, in fact. That’s a huge part of the problem. All it takes is a little experience with one of the particularly bad OCR tools out there, coupled with a few maybe not-quite-as-good-as-they-really-need-to-be documents, and it’s enough to put one off of OCR for life… as is clearly almost the case with BigT.

    Among freeware and/or open source OCR products, only the ones which incorporate the Tesseract engine tend to be considered “good” OCR tools; and even somee of those, because of their odd interfaces, still kinda’ suck (though, when the reason is interface, for an obviously different reason).

    Among commercial, fee-based OCR products, good, old Omnipage Pro remains best-of-breed. I haven’t even read (yet) the article to which this is a comment, but it would surprise me very much if Omnipage, despite its being a serious old-timer among OCR products, isn’t listed as best (or at least nearly best) among commercial products… that is, if commercial products are even included among the ten compared OCR tools. But there are others that are VERY good, too. Omnipage does not have a lock on that market. It’s simply good enough that one can’t go wrong to purchase it. There are at least two other products I can think of, off the top of my head, which are probably better, but they’re both a LOT more expensive. “ABBYY Fine Reader,” for example, comes to mind. But there are others.

    Surprisingly, the OCR capabilities of the not-free Adobe Acrobat Pro can be better than I would have originally guessed. I know quite a few people who forego purchasing something like Omnipage Pro, and just pop for Acrobat Pro (since they want to be able to do what it will help them do to PDF files, anyway), and they just use the OCR part of it to do all their OCRing. Of course, it means they have to scan the to-be-OCRed document into Acrobat as if it were going to be turned into a PDF file, but that’s not a big deal. Once OCRed, they simply copy-and-paste the OCRed text from Acrobat into Word or wherever they want to use it… then just close Acrobat without actually saving the OCRed text as a PDF file. To my surprise, that works fairly well… though, again, the quality of the scan(ned document) seriously affects just exactly HOW well.

    The quality of the “I.R.I.S.” OCR software which comes with the software which drives most HP printers is an example off a normally-commercial OCR product that’s probably not so good. It’s serviceable, but that’s about it.

    Among freeware OCR tools, FreeOCR, over at paperfile.net, because it incorporates the Tesseract engine, is probably about as good as free OCR software gets. It’s a bit too simple, but it gets the job done with remarkable accuracy… even more accurate than the freeware or “lite” or “special edition” versions of Omnipage which used to come with certain apps, or with pre-installed copies of Windows on big-name desktop machines during the Win9x days.

    Older versions of Microsoft Office (ending with Office 2003, I believe… but don’t hold me to that) used to come with a tool called “Microsoft Office Document Imaging” (MODI) which did OCR… and fairly well, I was surprised to see. It was actually not half bad; and I believe it’s still available out there… via a Microsoft Knowledge Base article, as I recall.

    Though the freeware NitroPDF Reader obviously can’t compare with Acrobat Pro (though, in my opinion, the commercial version of NitroPDF can), the fact is that many people only use Acrobat Pro to create PDF files by “printing” them to the Acrobat printer driver. They, alot of them, rarely use any of Acrobat Pro’s other features. For people like that, coupling the freeware NitroPDF Reader with the earlier-herein-mentioned FreeOCR tool, will absolutely duplicate — only for free — the ability of Acrobat Pro to simply create PDF files, and to OCR them (since the freeware NitroPDF Reader, unlike the freeware Adobe Acrobat Reader) comes with the ability to create PDF file via the “print to the PDF printer driver” method, and the FreeOCR product will OCR PDF files). So, for those who only use Acrobat Pro to create PDF files (but not necessarily edit or do anything else with them), and who have also discovered that Acrobat will OCR fairly decently), the NitroPDF Reader/FreeOCR combo would save them a ton of money).

    For those who, like me, believe that the commercial ($99, last time I checked) NitroPDF Pro is better than the commercial (and three times more expensive) Adobe Acrobat Pro, but who feel like they have to pay $30 more to get the version of NitroPDF Pro which includes the ability to OCR, adding the FreeOCR tool to regular NitroPDF Pro can save people thirty bucks because, believe me, the little FreeOCR tool is sufficiently as good as the OCR that comes in the OCR version of NitroPDF Pro to substitute for it.

    Those, in any case, are my opinions. Now I’ll go read the article and see how I did.

    ___________________________________
    Gregg DesElms
    Napa, California USA
    gregg at greggdeselms dot com

  4. Gregg DesElms

    Post-article ADDENDUM…

    Among desktop freeware OCR tool, the article liked “Cuneiform OpenOCR” a little better than my favorite, FreeOCR… but mostly, it seems, only because it better handles artistic fonts. I guess I have always considered artistic fonts not OCR-able; that one simply never tries to OCR artistic fonts, just like one never tries to drive a car on the sidewalk.

    Making software recommendations is a bit of an art. I’ve been doing it for 35 years, and I’m proud to say that it only took me five or so of those years to confine my criteria to the both possible and useful, else I end-up doing as the article does: Liking a piece of software because it’ll do a certain thing which might appeal to me, but which isn’t relevant in the real-world office, despite said piece of software both significant and glaring impediments. I tested “Cuneiform OpenOCR” in 2007, and found it so awful in an overall sense — including its reckless installation and housekeeping methodologies, and that it can’t OCR a PDF file — that I summarily dismissed it out-of-hand. I’m surprised that the Freeware Genius site was even slightly impressed.

    There is no magic bullet in the world of OCR. OCR can do a very narrowly-focused thing, and nothing else. The text needs to be high-contrast (black on white); the edges of the letters need to be razor sharp; and the font needs to be one of the “normal” ones (Arial, Helvetical, Times, etc.), and not some fancy artistic or display font. Period. If a person doesn’t make sure that those things are true before embarking on an OCR project (and if s/he also doesn’t make sure that the OCR software is as good as it can possibly be), then said person is begging to be frustrated…

    …like BigT, here. Expecting any OCR product to be able to handle display/artitistic fonts with even a tiny bit of accuracy is unreasonable; and so being impressed with an otherwise godawful tool like “Cuneiform OpenOCR” because it can curiously handle them better than other tools does nothing but add confusion to the recommendation, and decrease the credibility of the recommender.

    As for FreeOCR’s inability to handle columns… well… that’s true. But it’s FREEWARE, for the love of Mike. If a person wants columns, then that’s what the likes of Omnipage provides! That’s the way it is with nearly all freeware. With the freeware version of whatever is the software in question, the user typically needs to do a bit more work. So, what ELSE is new!?!

    Sheesh!

    ___________________________________
    Gregg L. DesElms
    Napa, California USA
    gregg at greggdeselms dot com

  5. ouman

    As we would say here “downunder” – “good on yer, mate.”
    I liked your pre- and post- approaches. You got to tell it like it is, and left a bit of room for those with different tastes/needs to discover for themselves. (personally ~ I’ve always enjoyed Omnipage, from ver.10 [my first] to ver.16, which I don’t think can be improved on. I have tried others as well.)

  6. Ricardo Garcia Ramírez

    Word 2007 and Word 2010 continue to include an OCR tool, and a very good one. I use it often. I consider this free because I have always used the latest version of Word from way way back. Say what you like about Microsoft, Word is one product that has improved with every new version.

Enter Your Email Here to Get Access for Free:

Go check your email!