idth use. They can be copied as much as needed to
produce new digital or print versions for free. The typos pointed out after the
text is released can be fixed at any time. Readers can change the font and size
of characters, the margins or the number of lines per page. Visually impaired
readers can increase the letter size. Blind readers can use speech recognition
software. All this is very difficult, if not impossible, with many other
formats.
If the books released are 99.9% accurate in the eyes of the general reader, the
goal is not to create authoritative editions, and to argue with a picky reader
whether a certain sentence should have a colon instead of a semi-colon between
its clauses.
Project Gutenberg is convinced that proofreading by human beings is a very
important step, and that this step makes all the difference. The use of scanned
books as is --converted to text format by OCR software with no proofreading--
gives a much lower quality result. After running OCR software, the text is 99%
reliable, in the best of cases. After proofreading, the text becomes 99.95%
reliable (a high percentage which is also the standard at the Library of
Congress).
For this reason, Project Gutenberg's perspective is rather different from that
of the Internet Archive. In its Text Archive, books are scanned and "OCRized",
but they are not proofread. The main formats used are XML, TIF and DjVu. Books
are not proofread either in other main collections: Open Content Alliance (OCA),
Google Books Search or Microsoft Live Books Search.
Project Gutenberg provides a "Nearly Full Text" search (on the first 100 K of
each file) using Google, with a database updated approximately monthly. It also
provides a search of book metadata (author, title, brief description, keywords)
as a participant in Yahoo!'s Content Acquisition Program, with a database
updated weekly. Both are available in the Online Book Catalog (at the bottom of
the page). In the Advanced Search, several fields can be filled: author, title,
subject, language, category (any, audio book, music, pictures), LoCC (Library of
Congress Catalog classification), filetype (text, PDF, HTML, XML, JPEG, etc.),
and eText/eBook No. A field "Full Text" was also added as an experimental
feature.
On Project Gutenberg's website, a File Recode Service allows users to convert
books in one format (ASCII, ISO-8859, Unicode and others) into another, and vice
versa. A much more powerful conversion
|