Yesterday Google released an Android app for Google Docs. The most flashy feature of this app is the ability to take a photo and convert it to text using OCR. I used to work at an OCR company back when a “smart phone” was a phone that had three lines of black-on-white text instead of two, so I was interested to see how well this newfangled OCR works. I tested the OCR capabilities of Google Docs for Android, and compared it with several other OCR programs. My conclusion: under ideal conditions the OCR works pretty well. However, real-life conditions are much worse, so this feature is like a conscience in a politician: widely trumpeted but rarely used.
The Test Page
My test document was the first page of Steven Levy’s article about Google from the April 2011 issue of Wired, titled “An Unconventional CEO”. Here’s what the page looks like:

Original Full Page
This is not a very challenging document. The text is very high resolution and doesn’t use many fonts. The layout is a little more challenging: there’s a header; two columns; and a few graphical elements. Good OCR programs will recognize and preserve this layout.
Google Docs for Android
For this test I used a Nexus S phone. Getting good photos was difficult because my hand and the phone itself kept casting shadows on the page. I tried to perform OCR anyway, but this uneven lighting resulted in horrendous performance. (I also tried using the camera’s flash, but that produced even worse results because the lighting was extremely uneven.) I am nothing if not fair, so I twisted myself like a pretzel until I managed to get a photo with even lighting across the page. For best results you should have the steady hands of a brain surgeon.
Once I managed to get some reasonable photos, I had the Google Docs app convert them to documents. The app doesn’t perform OCR on the device: it sends the photo to Google’s servers for processing. This was pretty fast. Here are the results of OCRing a corner of the page:

Android Test
ONE AFTERNOON A-BOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kleiner Perkins Canfield & Byers, DoerrÂ’s venture-capital firm, as well as an equal amount from Sequoia Capital. When they took the cash, they agreed
that they would hire an outsider to replace Page as CEO,
a common strategy to provide “adult supervision” to inexperienced founders. But now they were reneging. “They said, ‘WeÂ’ve changed our mind. We think we can run the company between the two of us/ Doerr recalls. ,ny between me »wu Ul nb, We.. Doerrfg instinct was to immediately sell his shares, but he held off. He m .ide Page and Brin an offer: He would set up meetings for them with the most brilliant Silicon Valley, so they could get a better sense of what the job entailed «After th t h 3 . E
told them, “if you think we should do a search, we will. And if you don’t want to, then
This was the best result out of all of my attempts. You can see that even this result contains many errors: so many that I won’t bother pointing them out. Nevertheless, most of the text was OCR’d correctly.
For comparison, here’s the same page, but this time the photo wasn’t taken quite as well. The results are much worse:

Poor Lighting
i
ome ABOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kleiner Perkins Caufield & Byers, DoerrÂ’s venture-capital firm, as well as an equal amount from Sequoia Capital. When
they took the cash, they agreed that they would hire an 0
utsider to replace as CEO
a common strategy to provide “adult supervìsion” to inexperienced founders. But now the y were renegìng. “They said, ‘WeÂ’ve changed our mind. We thi
e changed our mind. We think we can run the company between the two of us,”Â’ Doerr recalls.
immediately sell his shares, but he held off. He made P
. ne mane rage and Brin an offer: He would set up meetings for them with the most brilliant CEOs in Silicon Valley, 5° *hey Could get a better sense of what the job entailed. “After that. ” he told them, “if you think we should do a search. we willi And if vnu don’
El E El APH 2011
arch, we will. And if you want to, then
The second photo is only slightly worse than the first one, but it resulted in a huge drop in OCR quality. (And some of the text is even repeated, e.g. “changed our mind”. How does that happen?) Unfortunately, in real-life situations the photos that users take are more likely to resemble the second photo than the first one.
These tests show only a corner of the page. I also tried to photograph the entire page, but this failed miserably: the resulting document contained no text at all. I suspect Google Apps for Android reduces the resolution of the photo before sending it to the server for OCR! Otherwise, the OCR should have produced some results. The Nexus S camera has a resolution of 2560×1920. The test page was 10.8″x7.9″, which means that the photo was about 240 dpi (dots per inch), which is high enough to produce good OCR results. (The web version of Google Docs managed to produce reasonable results even with a 120 dpi image.) The most likely explanation I can think of is that the OCR was working with a low-resolution version of the page.
Another reason that I suspect Google Docs is downsampling the image is that when I exported the resulting document to HTML, the image in the HTML file was only 1280×960 pixels, i.e. 1.2 megapixels. This is anecdotal evidence, but it’s consistent with the complete failure to perform OCR on the full page.
What does this mean? If indeed Google is downsampling the photos before performing OCR then they could provide much better OCR results simply by sending the entire photo for processing instead of a lower-resolution version. And if they’re not downsampling the photos then they have some other processing problem, but that also means that there’s a lot they could do to improve the results. Let’s see if they do anything about it!
Google Docs on the Web
The rest of these tests were all performed on my PC. I scanned the page using a flatbed scanner (el cheapo Canon, but it’s more than good enough).
The web version of Google Docs can perform OCR on images. They restrict the uploaded images to 2 MB, so I couldn’t upload the full 300 dpi page because it was 9.5 MB (as a PNG; I didn’t want to use a lossy format such as JPEG). I reduced the page to 120 dpi (1.7 MB) and uploaded that file. Here are the results:
AN UNCONVENTIONALCEO Ten
H | s BY STEVEN LEVY
ONE AFTERNOON ABOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kle`iner Perkins Caufielcl & Byers, DoerrÂ’s venture-capital ?rm, as well as an equal amount from Sequoia Capital. when they took the cash, they agreed that they would hire an outsider to replace Page as CEO, a common strategy to provide “adult supervision” to inexperienced founders. But now they were reneging. “They said, ‘WeÂ’ve changed our mind. We think we can run the company between the two of us,Â’ ” Doerr recalls. Doerr’s ?rst instinct was to immediately sell his shares, but he held He made Page and Brin an He would set up meetings for them with the most brilliant CEOs in Silicon Valley, so they could get a better sense of what the job entailed. “After that,” he told them, “if you think we should do a search, we will. And if you don’t want to, then
24,000 employees later, cofounder retakes the topjob at Google. run the company like a startup
I’ll make a decision about that.” Page and Brin took a Magical Mystery Tour of high tech royalty: Apple’s Steve Jobs, Intel’s Andy Grove, lntuitÂ’s Scott Cook, Amazon .com’s Jeff Bezus, and others. Then they came back to Doerr. “We agree with you,” they told him; they were ready to hire a CEO. But they would
This is much better than Google Docs for Android: most of the text was converted correctly, even though the scan was very low resolution (120 dpi). However, the layout detection algorithm is pretty poor. To its credit, it detected correctly that the text was in two columns. But it mistakenly thought the header is using the same columns as the text below, so it broke it up in a ridiculous way. This is why the sentence “24,000 employees later, cofounder retakes the topjob at Google. run the company like a startup” appears in the middle of the text.
Other problems include: not recognizing word breaks (“UNCONVENTIONALCEO”; “topjob”); confusing “d” with “cl” in “Caufield”; converting “Bezos” to “Bezus”; not recognizing the “ff” ligature (this is why the words “off” and “offer” are missing).
If Google Docs relaxes its file size limit so that higher-resolution scans can be uploaded then it’s likely that it will be able to produce very accurate results on the text itself. However, the layout still won’t be preserved.
Tesseract
Tesseract is a free, open-source OCR program. Since Google had a hand in making it open-source, I thought perhaps this is the OCR engine that they use in Google Docs, so I wanted to try it out. Here are the results from OCR-ing the 300 dpi scan:
AN UNCONVENTIONAL CEO
Terr years and 24,090 emptayees tater,rf;ofsrrrrder
Larry Page retakes the top get; at
His goalzto run the cemparsy trite a ata.rttrp aggaia.
BY STEVEN LEVY
ONE AFTERNOON ABOUT
12 years ago, Larry Page and
Sergey Brin gave John Doerr
a call. A few months earlier,
the Google cofounders had
accepted $12.5 million from
Kleiner Perkins Caufield &
Byers, DoerrÂ’s venture-capital
firm, as well as an equal amount
from Sequoia Capital. When
they took the cash, they agreed that they would hire an outsider to replace Page as CEO,
a common strategy to provide “adult supervision” to inexperienced founders. But now
they were reneging. “They said, ‘We’ve changed our mind. We think we can run the com-
pany between the two of us,” ” Doerr recalls.
DoerrÂ’s first instinct was to immediately sell his shares, but he held OIT. He made Page
and Brin an offer: He would set up meetings for them with the most brilliant CEOs in
Silicon Valley, so they could get a better sense of what the job entailed. “After that,” he
told them, “if you think we should do a search, we will. And if you don’t want to, then
_-”
.ra ‘T
E] E] E] Aeazou
.-’,,~.4′;?}.7 l
si fl
I’ll make a decision about that.” Page and
Brin took a Magical Mystery Tour of high
tech royalty: AppleÂ’s Steve Jobs, IntelÂ’s
Andy Grove, IntuitÂ’s Scott Cook, Amazon
.comÂ’s Jeff Bezos, and others. Then they
came back to Doerr.
“We agree with you,” they told him; they
were ready to hire a CEO. But they would
sr, {“G1′Bfi1L1
Most of the text was recognized correctly, including words that Google Docs missed such as “offer” and “Caufield” (although it still got “off” wrong). Tesseract did a bad job on the header, and apparently thought some of the image was text (and didn’t use a dictionary to realize that it’s producing gibberish).
Of course, Tesseract had an easier task than Google Docs because it got a 300 dpi scan whereas Google Docs had only 120 dpi to work with. When I tried to give the 120 dpi scan to Tesseract it failed miserably, and produced 100% garbage.
ABBYY FineReader
Finally, I tested what a commercial OCR program can do. ABBYY FineReader is one of the best programs today, and they have a free trial version, so that’s the one I used. The results were by far the best of the bunch. It recognized almost all of the text correctly; preserved fonts and layout; and recognized the images and saved them. Here’s a screenshot of the PDF that FineReader created. Note that unlike all the other images in this post, all the text here is editable:

PDF Created by ABBYY FineReader
I found only two mistakes in the output from ABBYY FineReader: it changed “Amazon.com” to “Amazon.corn” (probably due to an overzealous use of the dictionary), and changed “ILLUSTRATION BY Grafilu” to “LLUSTRATIDN BY Grafilll” in the footer.
Conclusion
Between 1996 and 2000 I worked at Ligature, an OCR company, so I’m familiar with the quality of commercial OCR programs. Even back in 1996, all of the top commercial OCR programs produced results similar to what ABBYY FineReader produced in this roundup. I was shocked by how bad the free OCR solutions are. Google Docs for Web is the best of them, but even that program is problematic because of its file-size limit and the loss of layout.
As for Google Docs for Android: it produces mediocre results, even when the user goes to great lengths to give it good input. When using a mobile phone in the real world there will usually be many more challenges: the lighting is often bad; the camera isn’t held precisely perpendicular to the page; the user’s hand shakes; etc. So my advice is: if you’re Julian Assange and you want to duplicate super-secret documents in a hurry, nothing beats a flatbed scanner and a top-tier OCR program.