118 research outputs found

    Web as data. Challenges and triumphs of creating and working with a derived web corpus

    Get PDF
    Ditte Laursen: Web as data. Challenges and triumphs of creating and working with a derived web corpus, Aarhus Conference 2022, Monday 17 Octobe

    When the present web is later the past: web historiography, digital history and internet studies

    Get PDF
    "Taking as point of departure that since the mid-1990s the web has been an essential medium within society as well as in academia this article addresses some fundamental questions related to web historiography, that is the writing of the history of the web. After a brief identification of some limitations within digital history and Internet studies vis-a-vis web historiography it is argued that the web is in itself an important historical source, and that special attention must be drawn to the web in web archives - termed reborn-digital material - since these sources will probably be the only web left for future historians. In line with this argument the remainder of the article discusses the following methodological issues: What characterizes the reborn-digital material in web archives, and how does this affect the historian's use of the material as well as the possible application of digital analytical tools on this kind of material?" (author's abstract

    Visit at the Royal Library and Netarkivet

    Get PDF
    Ditte Laursen: Visit at the Royal Library and Netarkivet, Aarhus Conference 2022, Monday 17 Octobe

    Developing Datasheets for Archived Web Datasets

    Get PDF
    Emily Maemura: Developing Datasheets for Archived Web Datasets, Aarhus Conference 2022, Monday 17 Octobe

    ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

    Full text link
    Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US

    Det er svært at være intellektuel

    Get PDF

    Bogen som medie

    Get PDF
    I “Bogen som medie”, giver Niels Brügger et rids af bogens historiske transformationer. Litteraturens medie, bogen, er ikke blot en historisk konstant,der som en transparent ramme giver liv til litteraturen. Snarere viser bogens historiske udvikling og kunstneriske forsøg med bogen som medie, at litteratur og medie ikke er adskilte fænomener

    DIGITAL HISTORIE OG ARKIVERET WEB SOM HISTORISK KILDE

    Get PDF
    Digital historie og arkiveret web som historisk kildeInden for det seneste årti er mængden af digitalt lagrede data vokset eksplosivt, og i samme periode vokser mængden af født digitalt materiale som fx indhold på sociale medier og web. Fremtidens historikere skal bevæge sig rundt i et kildemæssigt landskab, hvor kilderne i stigende grad er digitale og i mange tilfælde kun digitale. Denne artikel argumenterer for, at alle digitale kilder ikke er ens, blot fordi de er digitale, hvilket fører til en grundlæggende skelnen mellem digitaliserede, født-digitale og genfødt-digitale kilder. Dernæst introduceres til én særlig type genfødt digitalt materiale, nemlig arkiveret web, der sammenlignes med digitaliserede avisarkiver. Endelig diskuteres det, hvilke konsekvenser det arkiverede webs særlige karakteristika har for dets brug som historisk kilde
    corecore