Research and Advanced Technology for
What's next for
Digital Deposit Libraries?
PRESERVING ONLINE CONTENT FOR FUTURE GENERATION
September 8, 2001. Darmstadt, Germany.
Legal deposit traditional organisation was designed in a general publication and diffusion setting that is upset by Internet growth. Preserving Internet contents is a new mission for heritage libraries requiring a reconsideration of traditional archiving methods and policies.
The new setting of publication is
characterised by the massive reduction of the costs of dissemination through
the network, which makes it possible to eliminate of traditional editorial
filter. This leads to an explosion of the available content and obliges heritage
institutions to develop ways to find and filter those contents to be preserved.
National libraries have adopted two different approaches up to now. The first one involves selecting sites that should be preserved, and collecting, cataloguing and preserving them as collection items. National libraries of Canada and Australia have started to build such highly selective online publications collections. The second strategy is to use automatic robots to collect massively online content, and organise navigable collections.The National library of Sweden and Internet Archive Foundation have done pioneer work in this area.
None of these two approaches seems fully satisfactory. However, maybe they are complementary.
The first strategy allows a good follow-up of site evolution and a direct contact with content provider, in order to get what online collecting can’t get (restricted access area, dynamic pages etc.). But the scope of this collection is very narrow regarding the potential of the Internet information space. And the selection criteria's legitimacy may be questionable as we just don’t know what will be important for future generations.
The second approach makes it possible to collect a large amount of content, which is widely distributed and highly representative of the Internet information space. Browsing the Internet archive provides very effective and easy access to on-line collections. But a large part of Web, known as the deep Web, remains out of reach. Even if its size has certainly been overestimated, it is a real problem that automatic online content gathering must address. Another major problem with this strategy is it doesn't allow the tracing of site changes over time if only a few rounds of information gathering take place each year. What about daily or weekly updated sites?
Could it be possible to mix these two approaches in order to get a better heritage collection?
9:00 - 9:30 Welcome and presentation
Bibliothèque nationale de France
Head of the digital library
Library of Congress
Humanities and Social Sciences Division
(60mn ) MINERVA: Mapping the INternet Electronic Resources Virtual Archive -Web Preservation at the Library of Congress
Die Deutsche Bibliothek
(20mn ) Collection of German online resources by Die Deutsche Bibliothek
10:45 Coffee break
The Danish Royal Library
Head of Digitization and Web Department
(60mn ) 'Danish Legal Deposit on the Internet: Current Solutions and Approaches for the Future'
Bibliotheque nationale du Quebec
Coordonnatrice, Section du depot legal
Direction des acquisitions
(60mn ) Legal Deposit and the Internet : Reconciling Two Worlds (HTML version)
powerpoint version with comments (908 Ko)
13:00 -14:00 Lunch break
The Swedish Royal Library
(60mn ) Harvesting the Swedish web space
Helsinki University Library
Director, Information technology
Database services & Development dept.
(60mn ) Harvesting the Finnish Web space - practical experiences
16:00 Coffee break
Bibliothèque nationale de France
Internet Archiving Project Manager
(60mn ) The BnF project for Web archiving
17:15 - 18:00 General Q&A. Conclusions.
Extra presentation from :
Dept. of Software Technology
Vienna Univ. of Technology
Austrian on-line archive : current status and next steps (pdf format)
|Workshop Organizers :|