19,259 research outputs found
Analyzing web archives through topic and event focused sub-collections
Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections
Towards Better Understanding Researcher Strategies in Cross-Lingual Event Analytics
With an increasing amount of information on globally important events, there
is a growing demand for efficient analytics of multilingual event-centric
information. Such analytics is particularly challenging due to the large amount
of content, the event dynamics and the language barrier. Although memory
institutions increasingly collect event-centric Web content in different
languages, very little is known about the strategies of researchers who conduct
analytics of such content. In this paper we present researchers' strategies for
the content, method and feature selection in the context of cross-lingual
event-centric analytics observed in two case studies on multilingual Wikipedia.
We discuss the influence factors for these strategies, the findings enabled by
the adopted methods along with the current limitations and provide
recommendations for services supporting researchers in cross-lingual
event-centric analytics.Comment: In Proceedings of the International Conference on Theory and Practice
of Digital Libraries 201
Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists
As web archives' holdings grow, archivists subdivide them into collections so
they are easier to understand and manage. In this work, we review the
collection structures of eight web archive platforms: : Archive-It, Conifer,
the Croatian Web Archive (HAW), the Internet Archive's user account web
archives, Library of Congress (LC), PANDORA, Trove, and the UK Web Archive
(UKWA). We note a plethora of different approaches to web archive collection
structures. Some web archive collections support sub-collections and some
permit embargoes. Curatorial decisions may be attributed to a single
organization or many. Archived web pages are known by many names: mementos,
copies, captures, or snapshots. Some platforms restrict a memento to a single
collection and others allow mementos to cross collections. Knowledge of
collection structures has implications for many different applications and
users. Visitors will need to understand how to navigate collections. Future
archivists will need to understand what options are available for designing
collections. Platform designers need it to know what possibilities exist. The
developers of tools that consume collections need to understand collection
structures so they can meet the needs of their users.Comment: 5 figures, 16 pages, accepted for publication at TPDL 202
Bots, Seeds and People: Web Archives as Infrastructure
The field of web archiving provides a unique mix of human and automated
agents collaborating to achieve the preservation of the web. Centuries old
theories of archival appraisal are being transplanted into the sociotechnical
environment of the World Wide Web with varying degrees of success. The work of
the archivist and bots in contact with the material of the web present a
distinctive and understudied CSCW shaped problem. To investigate this space we
conducted semi-structured interviews with archivists and technologists who were
directly involved in the selection of content from the web for archives. These
semi-structured interviews identified thematic areas that inform the appraisal
process in web archives, some of which are encoded in heuristics and
algorithms. Making the infrastructure of web archives legible to the archivist,
the automated agents and the future researcher is presented as a challenge to
the CSCW and archival community
Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision
Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark
Web archives: the future
T his report is structured first, to engage in some speculative thought about the possible futures of the web as an exercise in prom pting us to think about what we need to do now in order to make sure that we can reliably and fruitfully use archives of the w eb in the future. Next, we turn to considering the methods and tools being used to research the live web, as a pointer to the types of things that can be developed to help unde rstand the archived web. Then , we turn to a series of topics and questions that researchers want or may want to address using the archived web. In this final section, we i dentify some of the challenges individuals, organizations, and international bodies can target to increase our ability to explore these topi cs and answer these quest ions. We end the report with some conclusions based on what we have learned from this exercise
- …