“2001: A Space Odyssey” as Inspiration

Apple is currently suing Samsung, claiming that Samsung has copied the appearance of their phones and tablets from Apple. In response, Samsung says that it was inspired by the 1968 movie “2001: A Space Odyssey”. I was very excited when I saw this headline in Slashdot:

Samsung Cites 2001: A Space Odyssey In Apple Patent Case

Because I thought that this is what they are citing as prior art:

Monolith from 2001: A Space Odyssey
Monolith from 2001: A Space Odyssey

But sadly, it seems Samsung were only referring to some tablet computers that were shown briefly in use by the astronauts in the movie. You can see that picture in CNET’s coverage.

Samsung would have made a better argument, and a winning one, if they’d claimed that all of the current smartphones and tablets, which are black and shiny rectangles, are derived from the Monolith. As has been foretold.

(By the way, there’s an “Action Figure” toy of the Monolith. It’s just a featureless black plastic rectangle. This is a real thing.)

The OCR Quality of Google Docs

Yesterday Google released an Android app for Google Docs. The most flashy feature of this app is the ability to take a photo and convert it to text using OCR. I used to work at an OCR company back when a “smart phone” was a phone that had three lines of black-on-white text instead of two, so I was interested to see how well this newfangled OCR works. I tested the OCR capabilities of Google Docs for Android, and compared it with several other OCR programs. My conclusion: under ideal conditions the OCR works pretty well. However, real-life conditions are much worse, so this feature is like a conscience in a politician: widely trumpeted but rarely used.

The Test Page

My test document was the first page of Steven Levy’s article about Google from the April 2011 issue of Wired, titled “An Unconventional CEO”. Here’s what the page looks like:

Original Full Page

This is not a very challenging document. The text is very high resolution and doesn’t use many fonts. The layout is a little more challenging: there’s a header; two columns; and a few graphical elements. Good OCR programs will recognize and preserve this layout.

Google Docs for Android

For this test I used a Nexus S phone. Getting good photos was difficult because my hand and the phone itself kept casting shadows on the page. I tried to perform OCR anyway, but this uneven lighting resulted in horrendous performance. (I also tried using the camera’s flash, but that produced even worse results because the lighting was extremely uneven.) I am nothing if not fair, so I twisted myself like a pretzel until I managed to get a photo with even lighting across the page. For best results you should have the steady hands of a brain surgeon.

Once I managed to get some reasonable photos, I had the Google Docs app convert them to documents. The app doesn’t perform OCR on the device: it sends the photo to Google’s servers for processing. This was pretty fast. Here are the results of OCRing a corner of the page:

Android Test

ONE AFTERNOON A-BOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kleiner Perkins Canfield & Byers, DoerrÂ’s venture-capital firm, as well as an equal amount from Sequoia Capital. When they took the cash, they agreed
that they would hire an outsider to replace Page as CEO,
a common strategy to provide “adult supervision” to inexperienced founders. But now they were reneging. “They said, ‘WeÂ’ve changed our mind. We think we can run the company between the two of us/ Doerr recalls. ,ny between me »wu Ul nb, We.. Doerrfg instinct was to immediately sell his shares, but he held off. He m .ide Page and Brin an offer: He would set up meetings for them with the most brilliant Silicon Valley, so they could get a better sense of what the job entailed «After th t h 3 . E
told them, “if you think we should do a search, we will. And if you don’t want to, then

This was the best result out of all of my attempts. You can see that even this result contains many errors: so many that I won’t bother pointing them out. Nevertheless, most of the text was OCR’d correctly.

For comparison, here’s the same page, but this time the photo wasn’t taken quite as well. The results are much worse:

Poor Lighting

i
ome ABOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kleiner Perkins Caufield & Byers, DoerrÂ’s venture-capital firm, as well as an equal amount from Sequoia Capital. When
they took the cash, they agreed that they would hire an 0
utsider to replace as CEO
a common strategy to provide “adult supervìsion” to inexperienced founders. But now the y were renegìng. “They said, ‘WeÂ’ve changed our mind. We thi
e changed our mind. We think we can run the company between the two of us,”Â’ Doerr recalls.
immediately sell his shares, but he held off. He made P
. ne mane rage and Brin an offer: He would set up meetings for them with the most brilliant CEOs in Silicon Valley, 5° *hey Could get a better sense of what the job entailed. “After that. ” he told them, “if you think we should do a search. we willi And if vnu don’
El E El APH 2011
arch, we will. And if you want to, then

The second photo is only slightly worse than the first one, but it resulted in a huge drop in OCR quality. (And some of the text is even repeated, e.g. “changed our mind”. How does that happen?) Unfortunately, in real-life situations the photos that users take are more likely to resemble the second photo than the first one.

These tests show only a corner of the page. I also tried to photograph the entire page, but this failed miserably: the resulting document contained no text at all. I suspect Google Apps for Android reduces the resolution of the photo before sending it to the server for OCR! Otherwise, the OCR should have produced some results. The Nexus S camera has a resolution of 2560×1920. The test page was 10.8″x7.9″, which means that the photo was about 240 dpi (dots per inch), which is high enough to produce good OCR results. (The web version of Google Docs managed to produce reasonable results even with a 120 dpi image.) The most likely explanation I can think of is that the OCR was working with a low-resolution version of the page.

Another reason that I suspect Google Docs is downsampling the image is that when I exported the resulting document to HTML, the image in the HTML file was only 1280×960 pixels, i.e. 1.2 megapixels. This is anecdotal evidence, but it’s consistent with the complete failure to perform OCR on the full page.

What does this mean? If indeed Google is downsampling the photos before performing OCR then they could provide much better OCR results simply by sending the entire photo for processing instead of a lower-resolution version. And if they’re not downsampling the photos then they have some other processing problem, but that also means that there’s a lot they could do to improve the results. Let’s see if they do anything about it!

Google Docs on the Web

The rest of these tests were all performed on my PC. I scanned the page using a flatbed scanner (el cheapo Canon, but it’s more than good enough).

The web version of Google Docs can perform OCR on images. They restrict the uploaded images to 2 MB, so I couldn’t upload the full 300 dpi page because it was 9.5 MB (as a PNG; I didn’t want to use a lossy format such as JPEG). I reduced the page to 120 dpi (1.7 MB) and uploaded that file. Here are the results:

AN UNCONVENTIONALCEO Ten



H | s BY STEVEN LEVY



ONE AFTERNOON ABOUT 12 years ago, Larry Page and Sergey Brin gave John Doerr a call. A few months earlier, the Google cofounders had accepted $12.5 million from Kle`iner Perkins Caufielcl & Byers, DoerrÂ’s venture-capital ?rm, as well as an equal amount from Sequoia Capital. when they took the cash, they agreed that they would hire an outsider to replace Page as CEO, a common strategy to provide “adult supervision” to inexperienced founders. But now they were reneging. “They said, ‘WeÂ’ve changed our mind. We think we can run the company between the two of us,Â’ ” Doerr recalls. Doerr’s ?rst instinct was to immediately sell his shares, but he held He made Page and Brin an He would set up meetings for them with the most brilliant CEOs in Silicon Valley, so they could get a better sense of what the job entailed. “After that,” he told them, “if you think we should do a search, we will. And if you don’t want to, then



24,000 employees later, cofounder retakes the topjob at Google. run the company like a startup



I’ll make a decision about that.” Page and Brin took a Magical Mystery Tour of high tech royalty: Apple’s Steve Jobs, Intel’s Andy Grove, lntuitÂ’s Scott Cook, Amazon .com’s Jeff Bezus, and others. Then they came back to Doerr. “We agree with you,” they told him; they were ready to hire a CEO. But they would

 

 

This is much better than Google Docs for Android: most of the text was converted correctly, even though the scan was very low resolution (120 dpi). However, the layout detection algorithm is pretty poor. To its credit, it detected correctly that the text was in two columns. But it mistakenly thought the header is using the same columns as the text below, so it broke it up in a ridiculous way. This is why the sentence “24,000 employees later, cofounder retakes the topjob at Google. run the company like a startup” appears in the middle of the text.

Other problems include: not recognizing word breaks (“UNCONVENTIONALCEO”; “topjob”); confusing “d” with “cl” in “Caufield”; converting “Bezos” to “Bezus”; not recognizing the “ff” ligature (this is why the words “off” and “offer” are missing).

If Google Docs relaxes its file size limit so that higher-resolution scans can be uploaded then it’s likely that it will be able to produce very accurate results on the text itself. However,  the layout still won’t be preserved.

Tesseract

Tesseract is a free, open-source OCR program. Since Google had a hand in making it open-source, I thought perhaps this is the OCR engine that they use in Google Docs, so I wanted to try it out. Here are the results from OCR-ing the 300 dpi scan:

AN UNCONVENTIONAL CEO
Terr years and 24,090 emptayees tater,rf;ofsrrrrder
Larry Page retakes the top get; at
His goalzto run the cemparsy trite a ata.rttrp aggaia.
BY STEVEN LEVY
ONE AFTERNOON ABOUT
12 years ago, Larry Page and
Sergey Brin gave John Doerr
a call. A few months earlier,
the Google cofounders had
accepted $12.5 million from
Kleiner Perkins Caufield &
Byers, DoerrÂ’s venture-capital
firm, as well as an equal amount
from Sequoia Capital. When
they took the cash, they agreed that they would hire an outsider to replace Page as CEO,
a common strategy to provide “adult supervision” to inexperienced founders. But now
they were reneging. “They said, ‘We’ve changed our mind. We think we can run the com-
pany between the two of us,” ” Doerr recalls.
DoerrÂ’s first instinct was to immediately sell his shares, but he held OIT. He made Page
and Brin an offer: He would set up meetings for them with the most brilliant CEOs in
Silicon Valley, so they could get a better sense of what the job entailed. “After that,” he
told them, “if you think we should do a search, we will. And if you don’t want to, then
_-“
.ra ‘T
E] E] E] Aeazou
.-‘,,~.4′;?}.7 l
si fl
I’ll make a decision about that.” Page and
Brin took a Magical Mystery Tour of high
tech royalty: AppleÂ’s Steve Jobs, IntelÂ’s
Andy Grove, IntuitÂ’s Scott Cook, Amazon
.comÂ’s Jeff Bezos, and others. Then they
came back to Doerr.
“We agree with you,” they told him; they
were ready to hire a CEO. But they would
sr, {“G1’Bfi1L1

Most of the text was recognized correctly, including words that Google Docs missed such as “offer” and “Caufield” (although it still got “off” wrong). Tesseract did a bad job on the header, and apparently thought some of the image was text (and didn’t use a dictionary to realize that it’s producing gibberish).

Of course, Tesseract had an easier task than Google Docs because it got a 300 dpi scan whereas Google Docs had only 120 dpi to work with. When I tried to give the 120 dpi scan to Tesseract it failed miserably, and produced 100% garbage.

ABBYY FineReader

Finally, I tested what a commercial OCR program can do. ABBYY FineReader is one of the best programs today, and they have a free trial version, so that’s the one I used. The results were by far the best of the bunch. It recognized almost all of the text correctly; preserved fonts and layout; and recognized the images and saved them. Here’s a screenshot of the PDF that FineReader created. Note that unlike all the other images in this post, all the text here is editable:

PDF Created by ABBYY FineReader

I found only two mistakes in the output from ABBYY FineReader: it changed “Amazon.com” to “Amazon.corn” (probably due to an overzealous use of the dictionary), and changed “ILLUSTRATION BY Grafilu” to “LLUSTRATIDN BY Grafilll” in the footer.

Conclusion

Between 1996 and 2000 I worked at Ligature, an OCR company, so I’m familiar with the quality of commercial OCR programs. Even back in 1996, all of the top commercial OCR programs produced results similar to what ABBYY FineReader produced in this roundup. I was shocked by how bad the free OCR solutions are. Google Docs for Web is the best of them, but even that program is problematic because of its file-size limit and the loss of layout.

As for Google Docs for Android: it produces mediocre results, even when the user goes to great lengths to give it good input. When using a mobile phone in the real world there will usually be many more challenges: the lighting is often bad; the camera isn’t held precisely perpendicular to the page; the user’s hand shakes; etc. So my advice is: if you’re Julian Assange and you want to duplicate super-secret documents in a hurry, nothing beats a flatbed scanner and a top-tier OCR program.

Kitely Launched

Kitely

Come up to the lab and see what’s on the slab

– The Rocky Horror Picture Show

Kitely… it’s alive!

For the past few years I have been working on a startup with my partner, Ilan Tochner. Now we have finally opened it up to the world! It’s called Kitely, and what we do is provide virtual worlds on-demand. This means that instead of an always-on virtual world, these worlds exist only while there are people inside them. If the world becomes empty then we shut it down. When someone wants to enter the world, we restart it. Our users pay for their worlds only while they’re active.

In order to implement this we have created a cloud-based system, running on Amazon EC2. We currently use one Large instance to host our main server, and additional Large instances to host the virtual worlds. These instances are called “World Nodes”, and we start and stop them automatically in order to have enough instances to host all of the currently active worlds. We also keep a few empty nodes on standby, so that when someone wants to start a world we can load it into an existing EC2 instance instead of having to start a new instance (which takes much longer).

We opened up our service last week, but in the beginning only our friends and family came to check it out. That changed on Sunday when there was a MASSIVE spike in interest! Suddenly we were getting a steady stream of emails from the server telling us that new World Nodes have been started; tweets mentioning Kitely began flying; and we’ve been getting many questions and requests from our community. Ilan, our CEO, has been working around the clock to engage with everyone, while I have been working on adding the many features that we are still missing, and fixing the inevitable bugs that have been discovered by our users.

So come up to the lab, err, website, and try our service! You can create a virtual world in minutes, and it’s free while we’re in beta.

Books You Shouldn’t Read

This is not a novel to be tossed aside lightly. It should be thrown with great force.
– Dorothy Parker

If you seek book recommendations then there’s no shortage of sources. But unrecommendations are far more valuable: which books are overrated, overcelebrated, or just plain overwrought.

The Huffington Post has just published an article, 13 Books Nobody’s Read But Say They Have. Their list includes these books:

  • Geoffrey Chaucer – The Canterbury Tales
  • Alexis de Tocqueville – Democracy In America
  • James Joyce – Ulysses
  • Charles Dickens – A Christmas Carol
  • Salman Rushdie – The Satanic Verses
  • Herman Melville – Moby Dick
  • Stephen Hawking – A Brief History of Time
  • David Foster Wallace – Infinite Jest
  • Umberto Eco – The Name of the Rose
  • Marcel Proust – “Remembrance Of Things Past” or “In Search Of Lost Time” (the book so nice they named it twice)
  • Cervantes – Don Quixote
  • William Faulkner – As I Lay Dying
  • Leo Tolstoy – War and Peace

That article sure hit a nerve, with over 1,800 comments so far! Most of the commenters tell of books that they tried to read but couldn’t, each comment a small capsule of human anguish. I wish to add a few unrecommendations of my own.

Don’t read Moby Dick, by Herman Melville. This sneaky book pulls you in with clever patter and a wry, sympathetic narrator (Ishmael). But the good times only last about a third of the book (which, to be fair, is almost 200 pages!). As Ishmael begins to catalogue every part of the whale, from spout to tail, devoting a chapter to each (there’s even a whole chapter on the color white!), you may well consider whether you’re reading a piece of fiction or a textbook from the 1850’s. In the latter part of the book captain Ahab becomes increasingly prominent, delivering one tirade after another in impenetrable English that bears little similarity to the fluid prose from the beginning of the story. I have struggled to get through Moby Dick almost as much as Ahab fought against the White Whale himself, and although my fate was better than Ahab’s since I did finish the book (oops, was this a spoiler?), I don’t recommend the experience.

Don’t read The Da Vinci Code, by Dan Brown. The story is little more than a skeleton for Brown to hang his (admittedly interesting) research about early Christianity. The characters are sketchily drawn and emote about as much as Tom Hanks’s ridiculous hairdo in the movie adaptation of this book. The upside: if you do read this book it won’t take you much time as it’s easy to get through and far shorter than Moby Dick.

Most of all, under no circumstances should you read Gravity’s Rainbow by Thomas Pynchon! That book nearly killed me as I struggled to get through it. Parts of the book are well written, but often it’s opaque and it’s difficult to understand what the author meant. Perhaps I would have enjoyed it more if I had read it in a different mindset, skipping over difficult to understand passages in order to get the gist of the story. If I had two lives to live I would put this to the test.

I read a couple of books in the middle of reading Gravity’s Rainbow in order to refresh myself, much as the contestants of a reality show, when forced to eat something disgusting such as raw cows’ eyes, clean their palate with water between bites. After spending six months to read 200 pages I finally gave up, and the day I decided to stop reading this book the sun broke through the overcast sky, a warm breeze scented with wildflowers began to stir the grass, and little children ran laughing through dewy meadows as excited dogs ran beside them, barking and wagging their tails. Perhaps these were the very same dogs that were experimented upon in Gravity’s Rainbow, happy to be freed at last.

My advice is: if a book is not interesting after 100 pages, stop reading. There’s too much good literature to waste time on something you hate, no matter how critically acclaimed it is. As The Onion put it: failure is an option.

(Photo by gaspi *yg)

Confluence and Jira for $5

The good people at Atlassian are running an incredible offer: get full versions of Confluence (a wiki) and Jira (a bug tracker) for $5 each, for up to 5 users. These are outstanding products that normally cost $1200 each (but those versions support more users). The offer is valid only until April 24, so act now, while supplies last! At my company we bought both Confluence and Jira, and they’re absolutely indispensable.

I was surprised by this promotion because Confluence and Jira are not second-tier products: they’re leaders in their class. I’m a Wiki fanatic, and a few years ago I performed a comprehensive review of all the Wiki products I could find (about 20). Two wikis were head and shoulders above the competition: Confluence and JotSpot. They both had a very long list of features, such as rich-text editing, comprehensive security, good collaboration tools, etc. Both were highly polished: the features were complete and easy to use. The biggest difference between them was that Confluence was meant to be installed on the user’s server whereas JotSpot was hosted. I prefer to have the Wiki installed on my own server, for fast access from within the company’s LAN, so I chose Confluence. (Nowadays there’s also a hosted version of Confluence, if you’re one of Those People who value access over speed.)

Some time after my Great Wiki Evaluation JotSpot was acquired by Google and shut down, to reappear later as Google Sites. Google Sites is far simpler than JotSpot had been, which means it’s not as feature-rich and no longer in the same class as Confluence and other leading Wikis. Google have turned a Harley Davidson into a pink bicycle with training wheels, and that’s the end of JotSpot.

I also recommend Jira, although to a lesser extent than Confluence. Jira is very powerful and customizable, but I find its workflow to be rather cumbersome. I had used FogBugz before Jira, and it was a smoother experience. They’re both much better than open-source products such as Bugzilla and Trac, however (both of which I have also used).

Preemptive disclaimer: I’m not getting anything from Atlassian for this post; I just really, really like Confluence, and I want to spread the word.

The Real Trouble with Streakers

My brother recently started working at SportVU, a company that uses video analysis to track the positions of players in live sport events. It provides real-time statistics about the players’ average speed, distance covered, etc.

We were watching a basketball match tonight between Maccabi Tel-Aviv (Israel) and Barcelona (Spain). During the first quarter, seven Spanish demonstrators ran into the court waving Palestinian flags to protest Israel’s recent Gaza offensive. My brother wasn’t paying attention at the moment, so I pointed out to him that there’s a real-time political demonstration taking place before his eyes. His immediate comment: “this will wreak havoc with the tracking”.

Lego Streaker

(Photo by themattharris)

Namephreaks: Avian Edition

The famed San Francisco columnist Herb Caen had a regular feature called “Namephreaks”, which featured people whose names are related to their occupations. Well, meet Carla J. Dove, director of the Feather Identification Lab in the National Museum of Natural History. Dr. Dove is in charge of identifying the birds that hit US Airways Flight 1549 and forced it to crash-land in the Hudson River.

Nils Holgersson Airlines
Nils Holgersson Airlines

“Not X. Y.”: An Israeli Snowclone

Snowclones

A snowclone is a sentence template that can be used to construct many similar phrases. For example, the snowclone “X is the new Y” has been used extensively, e.g.: “White is the new black”, “40 is the new 30″, and “Snowclone is the new Cliché”.

My favorite snowclone is “In Soviet Russia, X Y’s You”. I find these jokes endlessly amusing, and it is my earnest belief that everyone else does, too. For example:

My mom: I’m going to program the VCR to record the news.
Me: In Soviet Russia, VCRs program you!

Neighbor: This bag? It’s dog food.
Me: In Soviet Russia, dogs feed you!

Coworker: Oren, can you help me debug this program? Something’s not right.
Me: In Soviet Russia, programs debug you!
Coworker: Never mind, I’ll ask Larry.

A snowclone is born

Here in Israel a new snowclone was recently coined, courtesy of our upcoming elections. Israel’s defense minister, Ehud Barak, is one of the contenders in the elections but trails behind two more popular politicians. Barak is considered smart and capable, but he’s also viewed as calculating and aloof, and he has a reputation for discarding his allies when he doesn’t need them anymore. Barak’s campaign managers decided to tackle this reputation with a series of ads that, unusually for Israeli politicians, poke fun at the candidate.

The ad campaign began with huge outdoor signs that listed Barak’s well-known flaws: “Not a pal”, “Not trendy”, “Not nice”.

Not a pal
Not a pal

After a few days the signs were replaced. In the new signs, the snowclone was completed: “Not a pal. A leader”.

Not a pal. A leader.
Not a pal. A leader.
Not trendy. A leader.
"Not trendy. A leader."; "Not likeable. A leader."

The snowclone spreads

The ad campaign was wildly successful in capturing the public’s attention. Political analysts self-importantly explained how the ads were either clever, or self-defeating. Advertising executives debated whether mentioning a product’s negative attributes is a good idea. Satiric shows riffed on the ad campaign with glee. But the meme didn’t remain confined to the context of Ehud Barak’s election campaign. The phrase “Not X. Y” became instantly recognizable and was adapted to every possible context.

One internet poster, in a forum for copywriters, said that the ad campaign makes Ehud Barak seem like a historical figure that was also known for being a strong leader:

Not nice. A leader.
Not nice. Not likeable. Not a pal. A leader.

Predictably, sports editors were quick to seize on the snowclone to spice up their stories. For example, one article described a basketball team’s new and tough-minded coach as “Not a pal. A coach”:

Not a pal. A coach.
Not a pal. A coach.

World-affairs stories were next. This story describes Putin’s foray into painting, which contrasts with his well-known tough image. The title is “Not a hunter, a painter: Vladimir Putin’s gentle side”.

Not a hunter. A painter.
Not a hunter. A painter.

Ubiquity

Yesterday the snowclone scored a major coupe, which cements its leading position in Israeli culture. Here is the front page of yesterday’s Yedioth Ahronoth, the most widely-read newspaper in Israel. It prominently displays a teaser for an interview with Gabi Ashkenazi, the current CJCS (Chairman of the Joint Chiefs of Staff), with the following title: “Not a pal. CJCS.”

Not a pal. CJCS.
Not a pal. CJCS.

In the same issue of Yedioth Ahronoth, there was a story in the entertainment section about the difficulty of having famous actors portray characters that are different from their well-known public image. The story mentioned Brad Pitt’s performance in his new movie, The Curious Case of Benjamin Button, where he portrays an old man for a large part of the movie. The accompanying photo had the following caption: “Not an icon. An actor.”

Not an icon. An actor.
Not an icon. An actor.

Two mentions in one issue, including on the front page. Truly, the snowclone has arrived.

The snowclone has also spread to informal communications. The residents of one apartment building had a problem: people were putting trash in a decorative column near the building’s entrance. So they put up a sign: “Not an ashtray. Not a trash can.” Some wit added in handwriting: “A leader”.

Not an ashtray. Not a garbage can. A leader.
Not an ashtray. Not a trash can. A leader.

PureText

One of my favorite utilities is Steve Miller’s PureText. It lets you copy-and-paste text while removing all of the formatting. This is extremely useful; in fact, I used it twice just while writing this blog post.

For example, I often want to copy code snippets from my IDE, Eclipse, into Microsoft Word. Here’s what the code looks like in Eclipse:

Code in Eclipse

And here’s what it looks like after pasting into Word:

Code in Microsoft Word

I blame Bill Gates.

However, with PureText, I simply paste using a different shortcut (Windows+V), and get only the text, without any of the formatting:

Code in Microsoft Word, without formatting

Mission accomplished.

(By the way, Microsoft Word allows you to remove formatting from pasted text, but only after you paste it. Using PureText is faster, and it works with all applications; not just Word.)