berg's perspective is rather different from that
of the Million Book Project, another project launched by several professors from
Carnegie Mellon University, and whose collections (10,611 books on June 1st,
2005) are hosted by the Internet Archive (the Internet Archive is also the
backup distribution site of Project Gutenberg). In the case of the Million Book
Project, books are scanned and "OCRized", but they are not proofread. The main
formats used are XML, TIF and DjVu.
On Project Gutenberg's website, a File Recode Service allows users to convert
books in one format (ASCII, ISO-8859, Unicode and Big-5) into another, and vice
versa. A much more powerful conversion program may be launched in the future,
with a conversion into still more formats (XML, HTML, PDF, TeX, RTF), including
Braille and voice. It will then also be possible to choose the font and size of
characters and the background color. Another eagerly expected conversion is that
of a book from one language to another by machine translation software. This may
be possible in a few years, when machine translation is accurate to 99%.
5. DISTRIBUTED PROOFREADERS, TO HANDLE SHARED PROOFREADING
The main "leap forward" of Project Gutenberg in the last few years is due to
Distributed Proofreaders.
Distributed Proofreaders was conceived in 2000 by Charles Franks to help in the
digitizing of public domain books. Originally meant to assist Project Gutenberg
in the handling of shared proofreading, Distributed Proofreaders became the main
source of Project Gutenberg eBooks. In 2002, Distributed Proofreaders became an
official Project Gutenberg site.
The number of eBooks that have been processed through Distributed Proofreaders
has grown fast, with a total of 3,000 eBooks in February 2004, 5,000 eBooks in
October 2004 and 7,000 eBooks in May 2005. On August 3, 2005, 7,639 books were
complete (processed through the site and posted to Project Gutenberg), 1,250
books were in progress (processed through the site but not yet posted, because
currently going through their final proofreading and assembly), and 831 books
were being proofread (currently being processed).
From the website one can access a program that allows several proofreaders to be
working on the same book at the same time, each proofreading on different pages.
This significantly speeds up the proofreading process. Volunteers register and
receive detailed instructions. For example, words in bold, italic
|