2014年9月2日 星期二

數位公共圖書館/Internet Archive將把1400萬張古書圖片分享至Flickr公眾相簿


Internet Archive將把1400萬張古書圖片分享至Flickr公眾相簿計畫


Internet Archive以保存完整的網站歷史資料而聞名,保存高達4000億頁的網頁資料,最早可回溯至1996年。除此之外,該組織的19 petabytes數位資料中還包括了橫跨500年歷史的200萬本公共領域書籍,這些書籍中約含有1400萬張圖片,近日開始上傳到Flickr供外界存取,截至今日,上傳至Flickr的書籍圖片已超過260萬張。


文/陳曉莉 | 2014-09-01發表


圖片來源:


Internet Archive


以「讓全球都能存取知識」為宗旨的非營利組織「網路檔案館」(Internet Archive)宣布加入Flickr公眾相簿計畫,將在未來幾個月陸續將200萬本書籍中的1400萬張圖片上傳至Flickr,以供全球用戶存取。

1996年創立的Internet Archive主要保存各種檔案與資料,並負責將其數位化,包含網站、音樂、電影,以及屬於公共領域的書籍。

Internet Archive以保存完整的網站歷史資料而聞名,保存高達4000億頁的網頁資料,最早可回溯至1996年。除此之外,該組織的19 petabytes數位資料中還包括了500年以來的6000萬頁的數位化文本。Internet Archive表示,該組織已經數位化橫跨500年歷史的200萬本公共領域的書籍,這些書籍中約含有1400萬張圖片,而且已加上文字註解,允許使用者搜尋、檢視、點選或是閱讀這些圖片,並於近日開始將它們上傳到照片分享網站Flickr供外界存取,截至今日,上傳至Flickr的書籍圖片已超過260萬張。

這批古老書籍中的圖片饒富興味,例如若以「bird」為關鍵字進行搜尋,將看到古老時代的各種不同種類的鳥,當搜尋「telephone」時,看到的是電話的演進,或是搜尋「death」,以一窺早期對死亡的想像。

專門蒐集二十世紀內容且對醫學特別有興趣的Wellcome Library負責人Simon Chaplin表示,此舉將改革醫療遺產的收藏,並讓更多人能夠存取這些書籍。Internet Archive也認為,藉由與Flickr及其他圖書館的合作,此一老書圖片收藏將會更加有趣,未來會有更多的圖片,也會對於如何使用圖片辨識工具有更多的想法。(編譯/陳曉莉)

***** 2012
數位公共圖書館
這問題每個"國家"和社團 都該關心

請多利用
archive.org

Creating a digital public library without Google's money

  • Michael Hiltzik
  • Michael Hiltzik

Google's settlement with authors and publishers has been tossed out, shining a spotlight on copyright law. Maybe we shouldn't entrust that kind of project to a corporation anyway.


Say what you want about Google — whether you believe it invariably adheres to its motto "Don't be evil" or you suspect that its true goal is world domination — the firm's behavior certainly has a way of shining the spotlight on the most important technological issues in our lives.
These include secrecy, privacy and now, in connection with a huge legal fight in which a New York federal judge last week dealt Google a huge defeat, copyright law.
Judge Denny Chin threw a wrench into six years of litigation by tossing out a 165-page settlement reached in 2008 between Google and authors and publishers groups.
At issue was Google's plan to create a global digitized library to "unlock the wisdom" imprisoned in the world's out-of-print books, as its co-founder Sergey Brin described the project in 2009.
Like other authors and researchers, I'm conflicted about the project. On the plus side, the vision of a widely accessible digital library is a worthy one that is, for the first time in human history, technologically achievable.
On the other hand, Google was plotting to acquire effective control over millions of works whose copyrights belong to others.
The Google books case began as a narrow legal dispute but broadened out, like an umbrella unfurled in the rain, into an effort to provide a shelter for a huge, monopolistic profit-oriented corporate enterprise.
The original lawsuit dealt with Google Book Search. The company announced in 2004 that it had made searchable digital copies, or scans, of millions of books contributed by Stanford, Harvard, the University of Michigan and other institutional libraries.
Type a search term into your Web browser, and Google would display "snippets" of its scanned books displaying your term. Since many of those books were still under copyright, the Authors Guild and the Assn. of American Publishers sued Google for copyright infringement.
Google's defense was that the snippets fell under the "fair use" exemption in copyright law, a very murky provision allowing limited use of works, without permission, for comment and criticism, news reporting, scholarship and research.
Had the settlement been limited to that issue, it might have gained Chin's approval and performed a public service besides by clarifying the fair use exemption for digital indexing — for example, the judge might have set a standard for how big a snippet and how many words can be displayed without permission.
But the document went much further. The settlement created a safe harbor for the vast digital bookstore Google hoped to create out of a digital hoard that so far comprises about 12 million volumes, or nearly 10% of the world's published library.
The settlement would have allowed Google to continue scanning and offer access to the results for a fee.
The company was to pay $45 million into a settlement fund for authors whose copyrighted books it had already scanned without permission. But infringement wasn't an issue for many books. Google or anyone else can copy and display the text of those out of copyright, such as the works of Charles Dickens.
Books under copyright and still in print — and therefore whose rights holders are not a mystery — are subject to deals Google makes with their publishers or authors, typically allowing the display of limited chunks, such as several pages, at a time.
The sticking point was "orphan books" — those copyrighted but out of print, and whose rights holders can't be found or identified. Google executives have portrayed their effort as one that would give these forgotten or overlooked tomes a new lease on life.
The settlement required the company to fund an independent registry which would, among other things, oversee interests in yet-unclaimed works and hold payments from Google for their exploitation.
Any author, including the parents of orphaned works when and if they surfaced, could opt out of Google's digital scanning on request. But that reverses the burden of existing copyright law, which forbids use unless the owners give affirmative permission for uses of their work.
Critics observed that the deal would have given the company a huge advantage in the digital marketplace by validating its strategy of scanning books first and worrying about copyrights later.
Google digitized material the ownership of which was unclear "in calculated disregard of authors' rights," observed copyright lawyer Robert Kunstadt in testimony cited by Chin. "Its business plan was, 'So, sue me.'" Rivals who went through the tough process of tracking down owners before scanning their books thus were left in the dust.
Judge Chin concluded that rewriting copyright law is a task that belongs in the halls of Congress, not a courtroom, hinting that he couldn't have approved the settlement even if he wanted to.
But his decision places the spotlight on several questions about the digital present and future.
One is: How to advance the goal of a digital public library without Google's deep pockets?
The rejection of the Google settlement has raised the profile of a leading alternative being promoted by Robert Darnton, a Harvard history professor and director of the university library, and a long-term critic of the Google settlement.
Darnton's idea is for charitable foundations to fund a digital analogue to the Library of Congress, freely available to all citizens and accessible to anyone within reach of the Internet. The Alfred P. Sloan Foundation has agreed to play a leading role.
"Now that the settlement seems to have unraveled, this looks like a serious alternative," Darnton told me.
Darnton's proposal would eliminate the problems of entrusting a major archival project to an entity whose main purpose is commercial, not scholarly. The settlement would have required Google to provide the participating university libraries with a free digital copy of its scanned out-of-print books. But it also would have allowed Google to restrict its use by faculty members to reading, printing, or downloading no more than five pages for free — and only once per person each academic term. For greater access, the institution would have had to buy a subscription.
Even a public digital library might need legal help dealing with orphan books. Chin's advice of referring the issue to Congress ignores the question of whether Congress is up to the job. As recently as 2008 a bill to fill the orphan-books gap sank without a trace in the House.
Chin's ruling may well provoke Google to pressure Congress to solve the problem so it can proceed with its own project.
But there it will face counter-lobbying by publishers, film studios and record labels. "Those content industries don't like any proposal seen as weakening copyright," says Peter Jaszi, an expert in copyright law at American University.
The Google books case now looks like a salvage operation for the dream of a digital library.
"There were many things in the settlement that were innovative and useful, and I'd be sorry to see lost," remarks Lewis Hyde, the author of "Common as Air," a recent book about copyright in the digital era.
Judge Chin's decision forces us — or allows us — to ponder the dream of a digital library without ceding our future to Google.
Michael Hiltzik's column appears Sundays and Wednesdays. Reach him at mhiltzik@latimes.com, read past columns at latimes.com/hiltzik, check out facebook.com/hiltzik and follow @latimeshiltzik on Twitter.

沒有留言:

網誌存檔