News / 

Consortium aims to digitize classic books, tech papers

Save Story
Leer en espaƱol

Estimated read time: 4-5 minutes

This archived news story is available only for your personal, non-commercial use. Information in the story may be outdated or superseded by additional information. Reading or replaying the story in its archived form does not constitute a republication of the story.

SAN JOSE, Calif. - A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web.

One of the Open Content Alliance's first projects will be to digitize the approximately 18,000-title collection of classic fiction and non-fiction American books owned by the University of California, the group said. That could be completed by the end of next year.

The consortium includes Adobe Systems, Hewlett-Packard Labs, the National Archives of the U.K., O'Reilly Media, the Prelinger Archives, the University of California and the University of Toronto.

The announcement of the consortium comes amid furious debate about a similar project called Google Library, in which the Mountain View tech giant is scanning and digitizing millions of books at select libraries.

Google's effort differs, though, because it intends to digitize material regardless of its copyright status. The members of the Open Content Alliance say they will only scan copyrighted material if they have the permission of the rights-holders.

Also, while Google only allows people to view excerpts of copyrighted material, the aim of the alliance is offer complete texts for viewing and downloading.

"Our goal is to help with the expansion of human knowledge," said Dave Mandelbrot, Yahoo's vice president of search content. "What we'd like to see in two or three years is a major collaborative effort where libraries are contributing material and publishers are providing permission" to digitize their content.

Each of the consortium's partners is providing different areas of expertise. The Internet Archive, a San Francisco non-profit that collects copies of Web pages and other material, is helping with the scanning of materials. Yahoo will index the content, make it searchable and pay for the scanning, about 10 cents a page. And Adobe will help convert some of it into its PDF format so that it can be downloaded from the Web.

The group is launching a Web site at, where people will be able to access the content. But the digitized materials also will be available through a special page on Yahoo's Web site and through the Internet Archive. In fact, Mandelbrot said, the group's goal is to make the content available so that any search engine can index it and make it available.

Digitizing the world's cultural archives, from television shows to classic books, has long been a burning ambition of Internet Archive founder Brewster Kahle.

In fact, Kahle's Internet Archive has already launched an effort to digitize books called the Million Book Project, a collaboration with Indian and Chinese agencies and Carnegie Mellon University.

"The real crime is that we have all these people using the Internet for research, but we don't have some of the best content on it,' Kahle said., the Internet retailer, also operates a massive book digitizing project. But its goal is to make the material available to customers to help spur sales.

"At some point, we want to meet in the middle so end-users win," Kahle said. "So they can have access to great works either for free or pay."

In addition to working with libraries to scan older content whose copyright is expired, the consortium will collaborate with publishers and authors who want to make their works available on the Web.

In some instances, the Open Content Alliancewill give copyright-holders the option of releasing their material under a Creative Commons license, an alternative licensing scheme that encourages re-use and distribution of content.

The announcement of the consortium comes just days after the 8,000-member Authors Guild sued Google in federal court seeking to stop its Google Library project. The group claims the project violates copyright law because authors have not first given permission for Google to digitize their works.

Google has defended its project by noting that it will only offer short excerpts - not full texts - of copyrighted material on its Web site. That type of use fall under "fair use" laws, the company contends. Also, Google is allowing authors to choose not to have their work digitized. The Electronic Frontier Foundation and others have defended Google in the debate.

A less-controversial companion project called Google Publisher allows publishers to tell Google which books they want to be in the search engine's index.

Google did not respond to a request for comment about the Open Content Alliance project.

The Association of American Publishers has been one of the critics of the Google Library program. Although an association spokeswoman said it had little information about the alliance project, she said it sounded "encouraging."

"At the very least they are approaching it from the standpoint that it's the author who says what can be done with his work," said Judith Platt, the association's director of communications. "There are ways the use of copyrighted material can approached without violating the rights of the copyright holder."


(c) 2005, San Jose Mercury News (San Jose, Calif.). Distributed by Knight Ridder/Tribune News Service.

Most recent News stories


Get informative articles and interesting stories delivered to your inbox weekly. Subscribe to the Trending 5.
By subscribing, you acknowledge and agree to's Terms of Use and Privacy Policy.

KSL Weather Forecast