I’ve come across another pastime which I’ve been working on over the last few months. I’m helping out with correcting the OCR’d text from digitised Australian newspapers which are being loaded into Trove. It’s a very simple, but addictive, task which anyone can do. All you have do is search the newspapers on Trove, and see if there are any corrections which need to be made to the electronically translated text. If so, you can make the corrections straight away. The original text remains untouched (so that any vandalism can be reverted), it’s only the OCR text which changes. The aim is to make it easier to search Trove by making sure that the machine-readable text is accurate.
The Manager of the Australian Newspapers Digitisation Project at the National Library of Australia, Rose Holley, has written several publications describing this crowdsourcing effort, including:
- Crowdsourcing: How and Why Should Libraries Do It?
- Many Hands Make Light Work: Public Collaborative OCR Text Correction in Australian Historic Newspapers
- A success story – Australian Newspapers Digitisation Program, 2009
- Harnessing the cognitive surplus of the nation: new opportunities for libraries in a time of change
I think helping out with this project is something that librarians are well-suited to. We generally have a good eye for detail, and we like to ensure that our clients have access to accurate information. So next time you’re searching in Trove, do a quick search in the newspapers on a topic that interests you, and see what comes up. Maybe you could see if there any articles that relate to your local area e.g. on historically important people or places, or even your own library. However, I should warn you thast it’s quite addictive once you start.
There are similar crowdsourcing projects whhich have been set up by libraries around the world. Rose Holley lists three of them in her blog – one in Finland (correcting newspapers), one in New York (correcting digitised recipes in the New York Public Library’s collection), and the Bodleian Library at Oxford University (describing digitised music scores). There is also Distributed Proofreaders, which carries out proofreading and formatting of public domain e-books for Project Gutenberg and other public domain e-book providers. There’s something for everyone, so why not become part of the crowd?