Volltext: Vorbereitungen zu einer Ausarbeitung des Sammelauftrages der Liechtensteinischen Landesbibliothek

Masterarbeit Beat Vogt 
We conduct a full domain crawl three times a year. This includes harvesting 
items outside of .is that have been determined of interest. Targeted harvest- 
ing of notable websites (e.g. news, political etc.) is conducted continuously 
What are your selection criterions for choosing websites that you collect? 
“Web pages form part of literary activity, so work is now being done to 
gather all pages on the national web domain (.is) into a web archive to be 
preserved for the future“ * page 9 
„The collection is limited to .is-domains and a hand-picked selection of 
Icelandic websites within other top-level domains. Harvesting is done using 
the Heritrix webcrawler developed by the Internet Archive and the Nordic 
national libraries. Access to the collection is open to all via a Wayback 
Machine like the one used by the Internet Archive“ 
Our (three times a year) large scale harvests aim to capture all website data 
related to Iceland and/or Icelandic culture. In practice this means all web- 
sites under the .is TLD plus a curated list of material under other TLDs. Any 
material in Icelandic, by Icelandic authors or relating to Iceland is consid- 
ered within scope. 
Targeted harvesting is conducted based on varying criteria, such as a specif- 
ic event (e.g. an election) or a site’s importance (e.g. popular news sites) 
where it merits more frequent harvesting 
Sources used: 
