FREE BOOKS

Author's List




PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  
the lowest common denominator". It can be read, written, copied and printed by any simple text editor or word processor on every computer in the world. It is the only format compatible with 99% of hardware and software. It can be used as it is or to create versions in many other formats. It will still be used while other formats will be obsolete (or are already obsolete, like formats of a few short-lived reading devices launched between 1999 and 2003). It is the assurance collections will never be obsolete, and will survive future technological changes. The goal is to preserve the texts not only over decades but over centuries. There is no other standard as widely used as ASCII right now, even Unicode, a "universal" encoding system created in 1991. Project Gutenberg also publishes eBooks in well-known formats like HTML, XML or RTF. There are Unicode files too. Any other format provided by volunteers (PDF, LIT, TeX and many others) is usually accepted, as long as they also supply an ASCII version where possible. But a large scale conversion into other formats is handed over to other organizations. For example Blackmask Online, which uses Project Gutenberg's collections to offer thousands of free eBooks in eight different formats based on the Open eBook (OeB) format. Or Manybooks.net, which converts Project Gutenberg's eBooks into formats readable on PDAs. Or Bookshare.org, the main digital library for the visual impaired community in the US, which converts books from Project Gutenberg into Braille format and DAISY (Digital Audio Information System) format. What is entailed exactly, once copyright clearance is received? Digitization is done by scanning the book page after page to get "image" files. Then volunteers run an OCR (Optical Character Recognition) software to convert "image" files into text files. Then each text file is proofread (i.e. re-read and corrected) by comparing it to the "image" file or the original page of the print version. There is an average of 10 mistakes per page for a good OCR package and... many more mistakes if the quality of the scanner and the OCR package is not great. The book is proofread twice on the computer screen by two different people, who make any corrections necessary. When the original is in poor condition, as with very old books, it is keyed in manually, word by word. Some volunteers themselves prefer to type short texts, or works they particularly like. But most books are sc
PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  



Top keywords:
formats
 
format
 

Project

 

Gutenberg

 

volunteers

 

eBooks

 

obsolete

 

Unicode

 

original

 
mistakes

package
 

converts

 

version

 

proofread

 

collections

 
computer
 

software

 

entailed

 
prefer
 

System


Information

 

Digitization

 

scanning

 

received

 
clearance
 

Digital

 

copyright

 

digital

 

library

 

Bookshare


visual
 
denominator
 
Braille
 

community

 

impaired

 
corrections
 

average

 

screen

 

scanner

 
quality

people

 
comparing
 

condition

 

Optical

 

Character

 
Recognition
 
lowest
 
common
 

manually

 
convert