Digitisation 2.0 style?

Most people are probably familiar with CAPTCHA style authentication, whereby online transactions are verified as being human rather than machine generated through the requirement to type displayed text. This spam filtering mechanism is now common when setting up email accounts, buying goods online or even booking a place at a CILIPS CPD event!

However, the creator of CAPTCHA, a computer scientist at Carnegie Mellon University in Pittsburgh, has developed the system a stage further. reCAPTCHA aims to assist in the digitisation of old texts by displaying words that baffle Optical Character Recognition (OCR) devices. When texts are scanned, words that are skipped by the OCR process are sent out as CAPTCHA words. More information about how this works in practice can be found on the CAPTCHA website.

The system is currently being used to digitise books from the Internet Archive and old editions of the New York Times. According to an interview in the journal Science, the CAPTCHA team reported that web users had transcribed enough text to fill more than 17,600 books, with better than 99% accuracy.

Could this be a viable, large scale digitisation strategy or is it just an example of Web 2.0 gone wild?!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: