Skip navigation

INNOBLOG

the insider's guide to innovation

Friday, May 23rd, 2008

ReCAPTCHA: An Unexpected Innovation

Alex Slawsby

Being a Bostonian and a big sports fan, I found myself with the Ticketmaster website loaded in my web browser at exactly 10:59 am this morning. While I had a sense that my quest for Celtics playoff tickets would shortly present me with a challenge, I had no idea that surmounting that challenge would contribute to a multipurpose innovation making an important contribution to society.

After a tough loss last night, our beloved Celtics are locked in a 1-1 struggle with the Detroit Pistons. With home-court advantage now gone, the Celtics head to Detroit for two games. Were either the Celtics or the Pistons to win the next two games, at a minimum, a decisive fifth game would take place back in Boston on Wednesday evening, May 28. As a result, the Celtics announced that they would put tickets to that decisive fifth game on sale at 11:00 am this morning.

Refreshing my web browser quickly, I hoped to be one of the first fans to reach the Ticketmaster page through which I would submit my ticket request. 10:58 am. 10:59 am. The page had yet to load. 11:00 am. Suddenly, the page was visible! I entered in the number of tickets I hoped to purchase, along with my desired arena section and price level. I clicked Submit.

After a brief delay, I was presented with a CAPTCHA. Developed at Carnegie Mellon University and standing for “Completely Automated Public Turing test to tell Computers and Humans Apart," a CAPTCHA is a challenge measure that websites implement to protect themselves from automated software programs. Often a distorted set of words or numbers and letters, modern CAPTCHAs are very hard for software programs to read while humans can identify the characters fairly easily.

After the CAPTCHA is presented, the website requests that the user type the exact set of characters into a text field. While humans are able to complete the task, software programs nearly always fail. As a result, requests from humans are allowed into the system — often services such as email account registration or ticket brokering — while automated requests are denied. Since spammers and ticket resellers often try to use brute-force, automated request programs, CAPTCHAs have become a sometimes bothersome, but necessary verification measure.

In my particular case, I encountered a CAPTCHA provided by the reCAPTCHA system, a program also created at Carengie Mellon and used by Ticketmaster. While completing the verification exercise, I noticed the following text on the right side of the page:

Digitize Books One Word at a Time
By entering the words in the box, you are also

helping to digitize books from the Internet Archive

and preserve literature that was written before

the computer age.

 

While tickets for the fifth playoff game sold out almost immediately and I was stymied there, I did some research into the reCAPTCHA system. As it turns out, reCAPTCHA was developed to help digitize books while also protecting systems from automated requests. The reCAPTCHA system presents two words to the website visitor — one word that an optical character recognition (OCR) program has been unable to read, and another word that several humans have correctly identified. If the visitor identifies both words correctly, the system passes the initially unknown word to additional visitors. If those visitors enter the same text response as the first, the system accepts the translation.

                According to a recent article, the use of reCAPTCHA by Ticketmaster, Facebook, Twitter, StumbleUpon, and other sites is helping Carnegie Mellon’s book archiving project successfully identify approximately one million words every day. Despite such a pace, Carnegie Mellon reported in late 2007 that approximately 100 million books remained to be digitized, a task that will still take 400 years to complete, despite the contribution of reCAPTCHA.

                While I was unable to secure a pair of Celtics playoff tickets, I was somewhat heartened by the fact that I had contributed to a worthwhile effort without knowing it.   


Discussion

From: Connie Michener
Posted: Saturday, May 24th, 2008 - 10:16 am EDT

I have not yet seen a CAPTCHA that is intentionally a real word! I often wonder when solving CAPTCHAs if there is more to know about types of human mistakes (keyboard fingering) versus machine/optical mistakes -- also, whatever happened to OCR? In the early '90's, I thought I'd be able to scan and translate my handwritten documents soon! This WIRED article offers more on CAPTCHAs, and the "Recaptcha" project: http://www.wired.com/techbiz/it/magazine/15-07/ff_humancomp

Connie Michener


From: Connie Michener
Posted: Saturday, May 24th, 2008 - 10:16 am EDT

I have not yet seen a CAPTCHA that is intentionally a real word! I often wonder when solving CAPTCHAs if there is more to know about types of human mistakes (keyboard fingering) versus machine/optical mistakes -- also, whatever happened to OCR? In the early '90's, I thought I'd be able to scan and translate my handwritten documents soon! This WIRED article offers more on CAPTCHAs, and the "Recaptcha" project: http://www.wired.com/techbiz/it/magazine/15-07/ff_humancomp

Connie Michener



Add a Comment:


Please log in to add to the discussion.