This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Science & Research

Online library reaches million book milestone

By Liz Tay on December 20, 2007 (9:00:00 PM)

Share    Print    Comments   

An international venture called the Universal Library Project has made more than one million books freely available in digitized format. The joint project of researchers from China, India, Egypt, and the US has the eventual aim of digitizing all published works of man, freeing the availability of information from geographic and socioeconomic boundaries, providing a basis for technological advancement, and preserving published works against time and tide.

One and a half million books in more than 20 languages, including Chinese, English, Arabic, and various Indian languages, are now accessible via a single Web portal. The online library includes rare and out-of-print books from private and public collections around the world.

"There are plenty of books that are no longer in copyright, and that have long been forgotten, but which would be useful to scholars, students, and just the general population," says Michael Shamos, a copyright lawyer, computer science professor, and co-director of the project at the Carnegie Mellon University in the US.

"There is a tremendous amount of knowledge that we thought would be lost to mankind if we didn't start digitizing," he says.

The project believes digital books on the Internet should be free to read, instantly available, easily accessible, printable on-demand, translatable to any language, and readable to both humans and machines. Additionally, with the advent of low-cost technology like the One Laptop Per Child project's XO laptop and ebook readers, digitized books are expected to reduce the cost of learning by replacing the repetitive cost of books with a one-off computer purchase and freely downloadable information.

According to the researchers' estimates, the Universal Library collection currently represents a mere one percent of the approximately 100 million books to ever have been published. Shamos expects only half of the published books in existence to be found in physical libraries around the world, so the task of physically locating a rare book can be a tedious process.

"The only way you can obtain an out-of-print book is to find a library that has one, and either travel to that library, or obtain that book through an interlibrary loan," he says. "It's a very slow process, especially considering that without seeing the book, you might not know if there's anything interesting in it for you."

When the project was initiated in 2002, members expected other research and commercial projects to digitize only around 50,000 books. Google Book Search is one such project that was started since that time; in recent years, it has come under fire for alleged breaches of copyright. While Shamos expressed a high regard for Google's efforts and the publicity it has attracted to book digitization, he said the Universal Library Project had "similar but different" goals.

"We want to digitize all published works of man; I don't think that anybody at Google would ever say that's what their goal is," he says. "Their goal is to sell advertising, and one of the ways that they find to sell advertising is to create a Web site that has such rich content that people want to visit it all the time. I don't think that Google has any interest in putting Sanskrit works up on their Web site."

Like Google, the Universal Library Project faces issues in publishing copyrighted books online. As such, books currently under copyright are only available in part via the Web portal, while books that are not bound by copyright restrictions are fully and freely available online.

Citing a need for information to be freely available, Shamos expects these copyright restrictions to become less of an issue in time, as publishers adapt to the low-cost business model that digital books offer.

"Copyright is going to become less and less significant [because] through digitization, the cost of publishing is vanishingly small," he says. "As the cost of copying goes down, the value of works goes down, and the ability to make profit from them goes down.

"There is a difference in reading for pleasure and reading for information; what is going to happen, I think, is that copyright is going to end up focusing on works of entertainment and not works of information."

High numbers

The Universal Library Project is the brainchild of researchers at Carnegie Mellon University, and has received $3.5 million in seed funding from the National Science Foundation. The project has also received in-kind contributions from the Zhejiang University in China and the Indian Institute of Science in India that have been valued at $10 million each, and has more recently forged a partnership with the Library at Alexandria in Egypt.

With more than 1,000 workers in about 50 scanning and digitization centres around the world, the Universal Library collection is growing at an estimated 7,000 books per day. There is a fair way to go before the project reaches its lofty book digitization goals; even so, the researchers have set their sights on eventually including content like music, artwork, lectures, and newspapers in the library.

"We believe that by having a universal library with all published works of man, and having multiple sites all around the world that house the entire content, it will be impossible to destroy these works," Shamos says.

"There can never again be a destruction of the library of Alexandria. There could be a destruction of the building, but there can't be a destruction of the works, and so this makes the creation of man impervious to changes in political regime, culture, Moirai."

Share    Print    Comments   

Comments

on Online library reaches million book milestone

Note: Comments are owned by the poster. We are not responsible for their content.

Online library reaches million book milestone

Posted by: Day on December 21, 2007 02:48 AM
Glad to see that our human beings finally recognize what is really need protected and how to protect it. It is not the copyright, but the knowledge we discovered and invented! Thanks to all those people.

#

Online library reaches million book milestone

Posted by: Anonymous [ip: 74.227.120.154] on December 21, 2007 06:13 AM
Might be nice if you mentioned Project Gutenberg [http://www.gutenberg.org/wiki/Main_Page] which has been doing this since 1971 with accessibility in mind as that javascript-infested unnavigable and undownloadable and outright non-textual site does not.

(I have no connection whatever to either organization.)

#

Online library reaches million book milestone

Posted by: Anonymous [ip: 59.165.242.3] on December 21, 2007 02:08 PM
I would like to add another thing which got unnoticed. Ancient Indian university Taxila was burnt by invaders and itseems, it was burning for 1 month - imagine how much knowledge would have vanished into the air. we have lost lots of knowledge. True such things should not happen again. But is digitization the only answer is questionable!

#

Linux.com developers are idiots

Posted by: Anonymous [ip: 68.126.191.42] on December 21, 2007 08:34 PM
Please, figure out where to insert PHP's nl2br function in your code. Bunch of incompetent Geek-Squad-esque idiots.

#

Online library reaches million book milestone

Posted by: Anonymous [ip: 59.178.44.68] on December 24, 2007 03:47 PM
Above, stop peddling your tool on every post. And you just turned outrageous here by ordering around linux.com.

To the anonymous person who mentions project gutenberg, it's an English onlybased website and a US centric one. This one is multilingual and includes books from other countries.

To the anonymous person who talks about Taxila university, it was actually Nalanda university, one of the world's oldest and biggest university which got pillaged and burnt.

#

Re: Online library reaches million book milestone

Posted by: Anonymous [ip: 217.132.157.147] on December 25, 2007 08:51 AM
This project might have some good ground BUT, it's arabic section most certainly sucks! apparently they didn't bother to get a native arabic speaker(where do I sign up?) and just gave someone with the basic knowledge of how arabic letters are pronounced the job of translating the titles from the arabic name written with english letters to arabic. Although arabic has very few peculiarities like the k in knife and the w in write, the guy did a TERRIBLE job. I couldn't understand any of those names until I clicked the title and read it in english

Also, English - for the better or worse - IS the most important language in the world today, almost everything that is written today will have an english version of it released.

Personally I will continue to use Gutenberg for the simple reason that I can do whatever I want with their text files(search? edit? reformat?) which is not true for the TIFF, although I would love to see gutenberg offer a PDF/ODF format too, it sucks when you print a book and find that there is one last line of chapter 1 on the page where chapter 2 starts...

#

Re: Online library reaches million book milestone

Posted by: Anonymous [ip: 68.126.191.42] on January 02, 2008 10:30 AM
hey fack you buddy. he made a good point. fucking newlines don't work.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya