About a year ago, I wrote about reCAPTCHA and thought it was an interesting alternative to the eye-crossing patterns and letters that many web sites use. I said “I’m not yet convinced it will work, but it’s an interesting idea.” Since then, I’ve come across reCAPTCHA on a number of sites and I really like it. It’s easy to use and it’s nice knowing that I’m helping convert scanned works to text.
One thing I’ve wondered, however, is just how accurate reCAPTCHA is. I just read a post over at ars technica that answers that question.
The researchers tested the system using a random sampling of 250 New York Times articles from different eras where the identity of every word was confirmed by two independent transcription experts. Each OCR software program managed about 84 percent accuracy but, when their results were combined with the reCAPTCHA system, the overall accuracy shot up to 99.1 percent. That’s actually within the bounds of professional transcription services that use two independent experts to generate copies that are then examined by a third party.
Very cool! Looks like a win-win situation for all.


One Comment
We’re using it on our site at PLCH!
http://virtuallibrary.cincinnatilibrary.org:8000/CAPTCHA_Web/PLCH/contact.aspx?