Apr 28 2011

The OCR Quality of Google Docs

Category: Reminisces,SoftwareOren Hurvitz @ 11:03 pm

Yesterday Google released an Android app for Google Docs. The most flashy feature of this app is the ability to take a photo and convert it to text using OCR. I used to work at an OCR company back when a “smart phone” was a phone that had three lines of black-on-white text instead of two, so I was interested to see how well this newfangled OCR works. I tested the OCR capabilities of Google Docs for Android, and compared it with several other OCR programs. My conclusion: under ideal conditions the OCR works pretty well. However, real-life conditions are much worse, so this feature is like a conscience in a politician: widely trumpeted but rarely used.

The Test Page

My test document was the first page of Steven Levy’s article about Google from the April 2011 issue of Wired, titled “An Unconventional CEO”. Here’s what the page looks like:

Original Full Page

This is not a very challenging document. The text is very high resolution and doesn’t use many fonts. The layout is a little more challenging: there’s a header; two columns; and a few graphical elements. Good OCR programs will recognize and preserve this layout.

Google Docs for Android

For this test I used a Nexus S phone. Getting good photos was difficult because my hand and the phone itself kept casting shadows on the page. I tried to perform OCR anyway, but this uneven lighting resulted in horrendous performance. (I also tried using the camera’s flash, but that produced even worse results because the lighting was extremely uneven.) I am nothing if not fair, so I twisted myself like a pretzel until I managed to get a photo with even lighting across the page. For best results you should have the steady hands of a brain surgeon.

Once I managed to get some reasonable photos, I had the Google Docs app convert them to documents. The app doesn’t perform OCR on the device: it sends the photo to Google’s servers for processing. This was pretty fast. Here are the results of OCRing a corner of the page:

Android Test

ONE AFTERNOON A-BOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kleiner Perkins Canfield & Byers, DoerrÂ’s venture-capital firm, as well as an equal amount from Sequoia Capital. When they took the cash, they agreed
that they would hire an outsider to replace Page as CEO,
a common strategy to provide “adult supervision” to inexperienced founders. But now they were reneging. “They said, ‘WeÂ’ve changed our mind. We think we can run the company between the two of us/ Doerr recalls. ,ny between me »wu Ul nb, We.. Doerrfg instinct was to immediately sell his shares, but he held off. He m .ide Page and Brin an offer: He would set up meetings for them with the most brilliant Silicon Valley, so they could get a better sense of what the job entailed «After th t h 3 . E
told them, “if you think we should do a search, we will. And if you don’t want to, then

This was the best result out of all of my attempts. You can see that even this result contains many errors: so many that I won’t bother pointing them out. Nevertheless, most of the text was OCR’d correctly.

For comparison, here’s the same page, but this time the photo wasn’t taken quite as well. The results are much worse:

Poor Lighting

i
ome ABOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kleiner Perkins Caufield & Byers, DoerrÂ’s venture-capital firm, as well as an equal amount from Sequoia Capital. When
they took the cash, they agreed that they would hire an 0
utsider to replace as CEO
a common strategy to provide “adult supervìsion” to inexperienced founders. But now the y were renegìng. “They said, ‘WeÂ’ve changed our mind. We thi
e changed our mind. We think we can run the company between the two of us,”Â’ Doerr recalls.
immediately sell his shares, but he held off. He made P
. ne mane rage and Brin an offer: He would set up meetings for them with the most brilliant CEOs in Silicon Valley, 5° *hey Could get a better sense of what the job entailed. “After that. ” he told them, “if you think we should do a search. we willi And if vnu don’
El E El APH 2011
arch, we will. And if you want to, then

The second photo is only slightly worse than the first one, but it resulted in a huge drop in OCR quality. (And some of the text is even repeated, e.g. “changed our mind”. How does that happen?) Unfortunately, in real-life situations the photos that users take are more likely to resemble the second photo than the first one.

These tests show only a corner of the page. I also tried to photograph the entire page, but this failed miserably: the resulting document contained no text at all. I suspect Google Apps for Android reduces the resolution of the photo before sending it to the server for OCR! Otherwise, the OCR should have produced some results. The Nexus S camera has a resolution of 2560×1920. The test page was 10.8″x7.9″, which means that the photo was about 240 dpi (dots per inch), which is high enough to produce good OCR results. (The web version of Google Docs managed to produce reasonable results even with a 120 dpi image.) The most likely explanation I can think of is that the OCR was working with a low-resolution version of the page.

Another reason that I suspect Google Docs is downsampling the image is that when I exported the resulting document to HTML, the image in the HTML file was only 1280×960 pixels, i.e. 1.2 megapixels. This is anecdotal evidence, but it’s consistent with the complete failure to perform OCR on the full page.

What does this mean? If indeed Google is downsampling the photos before performing OCR then they could provide much better OCR results simply by sending the entire photo for processing instead of a lower-resolution version. And if they’re not downsampling the photos then they have some other processing problem, but that also means that there’s a lot they could do to improve the results. Let’s see if they do anything about it!

Google Docs on the Web

The rest of these tests were all performed on my PC. I scanned the page using a flatbed scanner (el cheapo Canon, but it’s more than good enough).

The web version of Google Docs can perform OCR on images. They restrict the uploaded images to 2 MB, so I couldn’t upload the full 300 dpi page because it was 9.5 MB (as a PNG; I didn’t want to use a lossy format such as JPEG). I reduced the page to 120 dpi (1.7 MB) and uploaded that file. Here are the results:

AN UNCONVENTIONALCEO Ten



H | s BY STEVEN LEVY



ONE AFTERNOON ABOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kle`iner Perkins Caufielcl & Byers, DoerrÂ’s venture-capital ?rm, as well as an equal amount from Sequoia Capital. when they took the cash, they agreed that they would hire an outsider to replace Page as CEO, a common strategy to provide “adult supervision” to inexperienced founders. But now they were reneging. “They said, ‘WeÂ’ve changed our mind. We think we can run the company between the two of us,Â’ ” Doerr recalls. Doerr’s ?rst instinct was to immediately sell his shares, but he held He made Page and Brin an He would set up meetings for them with the most brilliant CEOs in Silicon Valley, so they could get a better sense of what the job entailed. “After that,” he told them, “if you think we should do a search, we will. And if you don’t want to, then



24,000 employees later, cofounder retakes the topjob at Google. run the company like a startup



I’ll make a decision about that.” Page and Brin took a Magical Mystery Tour of high tech royalty: Apple’s Steve Jobs, Intel’s Andy Grove, lntuitÂ’s Scott Cook, Amazon .com’s Jeff Bezus, and others. Then they came back to Doerr. “We agree with you,” they told him; they were ready to hire a CEO. But they would

 

 

This is much better than Google Docs for Android: most of the text was converted correctly, even though the scan was very low resolution (120 dpi). However, the layout detection algorithm is pretty poor. To its credit, it detected correctly that the text was in two columns. But it mistakenly thought the header is using the same columns as the text below, so it broke it up in a ridiculous way. This is why the sentence “24,000 employees later, cofounder retakes the topjob at Google. run the company like a startup” appears in the middle of the text.

Other problems include: not recognizing word breaks (“UNCONVENTIONALCEO”; “topjob”); confusing “d” with “cl” in “Caufield”; converting “Bezos” to “Bezus”; not recognizing the “ff” ligature (this is why the words “off” and “offer” are missing).

If Google Docs relaxes its file size limit so that higher-resolution scans can be uploaded then it’s likely that it will be able to produce very accurate results on the text itself. However,  the layout still won’t be preserved.

Tesseract

Tesseract is a free, open-source OCR program. Since Google had a hand in making it open-source, I thought perhaps this is the OCR engine that they use in Google Docs, so I wanted to try it out. Here are the results from OCR-ing the 300 dpi scan:

AN UNCONVENTIONAL CEO
Terr years and 24,090 emptayees tater,rf;ofsrrrrder
Larry Page retakes the top get; at
His goalzto run the cemparsy trite a ata.rttrp aggaia.
BY STEVEN LEVY
ONE AFTERNOON ABOUT
12 years ago, Larry Page and
Sergey Brin gave John Doerr
a call. A few months earlier,
the Google cofounders had
accepted $12.5 million from
Kleiner Perkins Caufield &
Byers, DoerrÂ’s venture-capital
firm, as well as an equal amount
from Sequoia Capital. When
they took the cash, they agreed that they would hire an outsider to replace Page as CEO,
a common strategy to provide “adult supervision” to inexperienced founders. But now
they were reneging. “They said, ‘We’ve changed our mind. We think we can run the com-
pany between the two of us,” ” Doerr recalls.
DoerrÂ’s first instinct was to immediately sell his shares, but he held OIT. He made Page
and Brin an offer: He would set up meetings for them with the most brilliant CEOs in
Silicon Valley, so they could get a better sense of what the job entailed. “After that,” he
told them, “if you think we should do a search, we will. And if you don’t want to, then
_-”
.ra ‘T
E] E] E] Aeazou
.-’,,~.4′;?}.7 l
si fl
I’ll make a decision about that.” Page and
Brin took a Magical Mystery Tour of high
tech royalty: AppleÂ’s Steve Jobs, IntelÂ’s
Andy Grove, IntuitÂ’s Scott Cook, Amazon
.comÂ’s Jeff Bezos, and others. Then they
came back to Doerr.
“We agree with you,” they told him; they
were ready to hire a CEO. But they would
sr, {“G1′Bfi1L1

Most of the text was recognized correctly, including words that Google Docs missed such as “offer” and “Caufield” (although it still got “off” wrong). Tesseract did a bad job on the header, and apparently thought some of the image was text (and didn’t use a dictionary to realize that it’s producing gibberish).

Of course, Tesseract had an easier task than Google Docs because it got a 300 dpi scan whereas Google Docs had only 120 dpi to work with. When I tried to give the 120 dpi scan to Tesseract it failed miserably, and produced 100% garbage.

ABBYY FineReader

Finally, I tested what a commercial OCR program can do. ABBYY FineReader is one of the best programs today, and they have a free trial version, so that’s the one I used. The results were by far the best of the bunch. It recognized almost all of the text correctly; preserved fonts and layout; and recognized the images and saved them. Here’s a screenshot of the PDF that FineReader created. Note that unlike all the other images in this post, all the text here is editable:

PDF Created by ABBYY FineReader

I found only two mistakes in the output from ABBYY FineReader: it changed “Amazon.com” to “Amazon.corn” (probably due to an overzealous use of the dictionary), and changed “ILLUSTRATION BY Grafilu” to “LLUSTRATIDN BY Grafilll” in the footer.

Conclusion

Between 1996 and 2000 I worked at Ligature, an OCR company, so I’m familiar with the quality of commercial OCR programs. Even back in 1996, all of the top commercial OCR programs produced results similar to what ABBYY FineReader produced in this roundup. I was shocked by how bad the free OCR solutions are. Google Docs for Web is the best of them, but even that program is problematic because of its file-size limit and the loss of layout.

As for Google Docs for Android: it produces mediocre results, even when the user goes to great lengths to give it good input. When using a mobile phone in the real world there will usually be many more challenges: the lighting is often bad; the camera isn’t held precisely perpendicular to the page; the user’s hand shakes; etc. So my advice is: if you’re Julian Assange and you want to duplicate super-secret documents in a hurry, nothing beats a flatbed scanner and a top-tier OCR program.


Jun 05 2008

Anatomy of a Con

Category: Conferences,ReminiscesOren Hurvitz @ 10:10 pm

This is the tale of how I was conned at a conference. (As far as alliterative woes are concerned, I could have done worse: I could have been shafted at a shindig. Hoodwinked at a hootenanny. Mauled at a meal. You get the picture.)

Amsterdam, June 2000. The conference was about WAP. Do you remember WAP? It was an attempt to rewrite the entire web infrastructure from scratch for mobile phones. Instead of HTML we were supposed to use WML: a markup language which is almost, but not quite, entirely unlike HTML. WAP flopped, but not before dumping a sediment of useless software on every mobile phone, and an 800-page tome in my suitcase (it was given away at the conference).

But I didn’t care about any of that in 2000. This was the dot-com era before the bubble burst, the weather was sunny and Amsterdam beautiful. After the conference ended I had some time to walk around Amsterdam and take in the canals, the bikes, and the coffee shops. The next day I took a train to the airport, and that’s when I was conned and relieved of my briefcase, passport, plane ticket, camera, and various other items (but sadly, not the huge book).

Con Man

Con Man

It was mid-morning, and the train was almost empty. I had an entire car to myself at first. After a few stops one other guy came in and sat across the aisle from me. He seemed quite ordinary: in his 30′s, some stubble, no distinguishing characteristics. He asked me something trivial about the stops that the train will make, but mostly just looked out the window and fiddled with his prepaid phone cards. (A note to my younger readers: in Ye Olden Days, before everyone had cellphones, people made calls using public phone booths. Phone cards were used to pay for these calls.)

A couple of stops before the airport Phone Card Guy jumped up as if he’d just noticed that this is his stop, and hurried out, dropping a few of his phone cards in his haste. I looked at the cards on the floor, and then around the train. There was no one else there. So I picked up the cards, went to the door of the train and shouted after him, “you dropped your phone cards!” Phone Card Guy was already some distance away from the train, but he came back and took the cards, thanked me, and walked away. While this was happening, a passenger that I hadn’t seen before came behind me and left the train through the doorway I was standing in. He looked like a businessman: he wore a suit, and was in his 50′s.

I returned to my seat, and the train started moving again. It was then that I noticed that my briefcase and camera were gone from the seat where I’d left them, and in a flash I realized what had happened.

In con movies, at this point we would see a quick succession of scenes from earlier in the movie, explaining how the con was put together and making us see everything in a different light. This is how it worked: Phone Card Guy established rapport with me, so that I’ll be motivated to go to the door of the train and tell him that he dropped his phone cards. Suit Guy was his accomplice: his job was to lurk one car over and watch to see when I had left my seat and had my back turned. At that point Suit Guy came into the car, grabbed what he could, and left through the same door I was standing at! Phone Card Guy had gone one way and Suit Guy the opposite way, so I was looking in the wrong direction and didn’t notice that Suit Guy was holding my briefcase. This was all timed so that the train started moving just as I realized what happened, so I couldn’t run after them or call for help.

I was so full of admiration for their smooth technique that I almost didn’t mind losing my stuff. Fortunately there was enough time for me to get replacement travel documents at the airport. They didn’t issue me a new passport on the spot, of course: instead they had me travel with the sort of papers that are normally used to transport pets. Wuf!

What I regret most is the loss of my camera, with its photos of Amsterdam. I hope the con men liked them.