The first rule of crisis management is to get ahead of the story. Since my shameful secret is about to be revealed, I decided to break it here first. I’d rather you heard it from me than from the media:
In March 2008 I watched Rick Astley’s music video Never Gonna Give You Up on YouTube. It’s widely considered to be the most corny music video ever created. I have no excuse; I can’t even claim to have been RickRolled. I heard about the video, and willingly went and viewed it. It was me, just me, officer!
The reason for this confession is that Google is about to hand over to Viacom a complete list of every video watched by YouTube users:
[...] the judge granted a Viacom motion that records of every video watched by YouTube users, including their login names and IP addresses, be turned over to the entertainment giant.
The order prevents Viacom from using this information to target lawsuits at users. But it makes no sense to give this information to Viacom in the first place: Google could easily make this data anonymous, and they’ve asked Viacom to do just that. Viacom have said that they won’t use any personally identifiable data, but they haven’t replied to Google’s request directly. These mixed signals make me lunge for my tin foil hat: what could explain Viacom’s behavior? Perhaps, once they have the logs in their possession, they intend to ask the judge to allow them greater use of the data. Or perhaps the data will be “accidentally” leaked — after all, that sort of thing happens all the time.
- To improve search results
- To maintain the security of their systems
- To prevent fraud and other abuses
It’s true that in order to achieve these goals Google needs to save the search logs. However, the problem isn’t that they keep the search logs; it’s that they keep personally identifiable information in the logs, which lets them (or anyone else, such as Viacom) associate searches and clicks with real people. Google keeps this information for 18 months, and that’s far too long. They could erase the personal information much sooner and still achieve all of the goals described above.
For example, Google use the search logs to find common spelling mistakes made by users, so that they can offer automatic suggestions for the correct spelling. This doesn’t require any personally identifiable information. Another use for the search logs is to detect click fraud. For this purpose it is indeed useful to look at the search and click history of individual users. However, the benefit of this personal data quickly diminishes with time. Data about click fraud that is over a month old should be considered prehistoric; the perpetrators are long gone from whatever IP they had been using.
Why are logs kept for 18 months before being anonymized?
We strike a reasonable balance between the competing pressures we face, such as the privacy of our users, the security of our systems and the need for innovation. We believe 18 months strikes the right balance.
It’s time we told Google: 18 months is too long. One month would strike the right balance between privacy, security and the need for innovation. With one month of personally identifiable information, Google will be able to catch all the fraud they are ever likely to catch. After that, it’s time to anonymize the data. The anonymized data is still useful for improving their search engine.
Go to Google’s Privacy Feedback page and ask them to reduce the amount of time they keep personally identifiable data in their logs. You could use a message such as this one:
Google isn’t alone in this. Microsoft also anonymizes its logs after 18 months. Yahoo makes do with just 13 months (how did they come up with that number? Perhaps it also holds occult significance). Ask.com, the fourth-largest search provider, gives its users the option of making completely anonymous searches. But we should focus on Google: where the market leader goes, the rest will surely follow.