FREE BOOKS

Author's List




PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35  
>>  
read on a cell phone. Or Wattpad, a free service for reading and sharing stories on a mobile phone. Once downloaded to your phone, the service gives instant access to works from Project Gutenberg. As a volunteer, the wisest thing to do is to choose a book published before 1923. It is also required that copyright clearance be confirmed prior to working on any book by sending a photocopy of the title page and verso page (even if the latter is blank) to Michael Hart. The pages should be sent as scans to be uploaded on the website. For people who cannot create scans, it is possible to send photocopies by postal mail. The pages will then be filed, either on paper or electronically, so that the proof will be available in the future, to demonstrate if necessary that the book is in the public domain under the US law. Project Gutenberg doesn't release any book until the book's copyright status has been confirmed. What is entailed exactly, once copyright clearance is received? Digitization is done by scanning the book page after page to get "image" files. Then volunteers run an OCR (Optical Character Recognition) software to convert "image" files into text files. Then each text file is proofread (i.e. re-read and corrected) by comparing it to the "image" file or the original page of the print version. There is an average of 10 mistakes per page for a good OCR package, and many more mistakes if the quality of the scanner and the OCR package is not great. The book is proofread twice on the computer screen by two different people, who make any corrections necessary. When the original is in poor condition, as with very old books, it is keyed in manually, word by word. Some volunteers themselves prefer to type short texts, or works they particularly like. But most books are scanned, "OCRized" and proofread. Contrary to digitization in "image format", which consists only in scanning the pages, digitization in "text format" adds the OCR step: a) the book can be copied, indexed, searched, analyzed and compared with other books; b) it is possible to search the content of the book with the "Find" button available in any browser and any software, without a specific search engine. The assets of digitization in "text format" are numerous. It makes a smaller and more easily sendable computer file, unlike digitization in "image format", which produces a bulky "photo" file. Contrary to other formats, the files are accessible for low-bandw
PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35  
>>  



Top keywords:
digitization
 
format
 

proofread

 

copyright

 

Contrary

 

people

 

mistakes

 

volunteers

 

original

 
software

scanning
 

package

 

computer

 

Gutenberg

 

clearance

 
search
 

service

 

Project

 
confirmed
 

scanner


numerous

 

quality

 

assets

 

corrections

 
screen
 

smaller

 

unlike

 

formats

 

version

 

accessible


corrected
 
comparing
 
average
 

sendable

 

easily

 
produces
 

engine

 

scanned

 

OCRized

 
content

compared

 
analyzed
 

copied

 

consists

 

searched

 
indexed
 
manually
 
specific
 

condition

 
browser