Geek Trivia

Google Has Leveraged Security Captcha To Successfully Do What?

Analyze Astronomical Data
Fold Proteins
Solve Encryption Algorithms
Digitize Books
In The Early 20th Century What Heavily Polluting Substance Was Used For Mosquito Control?

Answer: Digitize Books

A captcha is a small puzzle intended to establish that the user of a service or interface is human. Any time you’ve gone to sign up for an email newsletter, join a discussion forum, or other online activity and been asked to identify a word, read text with scribbles through it, label a photo, or other activity that a computer would be terrible at but even a very unclever human would excel at, you’ve experienced a captcha.

Historically, captchas were just blurbs of text or numbers meshed into a loud background. If you were a computer, you couldn’t easily parse out the data hidden in the image file, but a human could easily see that behind all the yellow and blue marks there was a sequence of letters and numbers like “Y345LW8”. Type that sequence in and you confirm that you’re not just a dirty spam bot.

What if you could leverage all those hundreds of thousands of useless captcha solutions and make them useful? That’s exactly what Google did with the reCaptcha system. Instead of giving you a random string of letters and numbers masked by hash marks, they serve up a pair of words from their enormous database of digitized books, newspapers, and other documents. All of those documents have been worked over by Google’s OCR software, but as anyone who has used OCR software can tell you, even the best software is very unreliable. Humans, however, are fantastic at reading text that is blurry, smudged, unclear, or otherwise unrecognizable to a machine. So every time you visit a web site with a reCaptcha in place, you’re actually helping Google to figure out exactly what that scanned word is. When enough users return the same value for the word, it is reanalyzed and confirmed.

Image courtesy of Google.