261,662 research outputs found

    The information retrieval challenge of human digital memories

    Get PDF
    Today people are storing increasing amounts of personal information in digital format. While storage of such information is becoming straight forward, retrieval from the vast personal archives that this is creating poses significant challenges. Existing retrieval techniques are good at retrieving from non-personal spaces, such as the World Wide Web. However they are not sufficient for retrieval of items from these new unstructured spaces which contain items that are personal to the individual, and of which the user has personal memories and with which has had previous interaction. We believe that there are new and exciting possibilities for retrieval from personal archives. Memory cues act as triggers for individuals in the remembering process, a better understanding of memory cues will enable us to design new and effective retrieval algorithms and systems for personal archives. Context data, such as time and location, is already proving to play a key part in this special retrieval domain, for example for searching personal photo archives, we believe there are many other rich sources of context that can be exploited for retrieval from personal archives

    The Use of Web 2.0 Technologies in Archives: Developing exemplary practice for use by archival practitioners

    Get PDF
    Web 2.0 technologies have fundamentally changed the way in which people interact and find information online. Archives are attempting to utilize Web 2.0 technologies to reach new users and promote their collections, but many have implemented these technologies without a full understanding of how to use them appropriately. Research has been conducted concerning libraries implementing Web 2.0 technologies, but much of the research involving archives has consisted of anecdotal evidence and is limited in scope. This research fills that gap by gathering data on the use of many different technologies by different kinds of archives around the globe. Using surveys and semi-structured interviews, the researcher gathered information on what technologies archives are using as well as how and why they are used. He then discusses the various problems that confront archivists of all types seeking to implement Web 2.0 technologies. Finally, the study concludes with a discussion of the implications of these problems and offering a set of exemplary practices that can be utilized by archives seeking to implement Web 2.0 technologies successfully

    Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

    Get PDF
    Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding an expensive proposition. This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to produce a social media story – a visualization with which most web users are familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when presented together, summarize the topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection. We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model. Our tools serve as a reference implementation of our Dark and Stormy Archives storytelling model. Hypercane selects exemplars and generates story metadata. MementoEmbed generates document metadata. Raintale visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding immediately, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections

    Archives for the Dark Web: A Field Guide for Study

    Get PDF
    This chapter provides a field guide for other digital humanists who want to study the Dark Web. In order to focus the chapter, I emphasize my belief that, in order to study the cultures of Dark Web sites and users, the digital humanist must engage with these systems' technical infrastructures. I will provide specific reasons why I believe that understanding the technical details of Freenet, Tor, and I2P will benefit any researchers who study these systems, even if they focus on end users, aesthetics, or Dark Web cultures. To this end, I offer a catalog of archives and resources researchers could draw on and a discussion of why researchers should build their own archives. I conclude with some remarks about ethics of Dark Web research

    Interrogating the politics and performativity of web archives

    No full text
    Since the mid-1990s institutions such as national libraries and the Internet Archive have been ‘archiving the Web’ through the harvesting, collection and preservation of ‘web objects’ (e.g. websites, web pages, social media) in web archives [55]. Much of the focus of the web archiving community has been on the continued development of technologies and practices for web collection development [38], with an increased attention in recent years on facilitating the scholarly use of web archives [25, 24, 61]. This research will take a step back to consider the place of web archives in light of ‘the archival turn’ and emergent questions over the ever- expansive role of the archive and the Web in everyday life. First coined by Stoler [81], ‘the archival turn’ denotes a shift from ‘archive as source’ to ‘archive as subject,’ signalling wide-ranging epistemological questions concerning the role of the archive (and the archivist) in shaping and legitimising knowledge and particular ways of knowing. This research proposes to re-situate web archives as places of knowledge and cultural production in their own right, by implicating both the web archivist and the technologies in the shaping of the ‘politics of ephemerality’ [82] that lead to the creation and maintenance of web archives. This study will identify key underlying assumptions about what the Web is (e.g. a ‘Web of Documents,’ ‘abstract information space’), what of the contemporary Web is (or isn’t) being archived, and the relative affordances for web archival practice and scholarly use. Furthermore, drawing on critical approaches to information, Science and Technology Studies and Web Science, this research will engage with the performativity of web archiving, the practices of selection, collection and classification, and the possible implications for a socio-technical understanding of web archives

    Exploiting the social and semantic web for guided web archiving

    Get PDF
    The constantly growing amount of Web content and the success of the Social Web lead to increasing needs for Web archiving. These needs go beyond the pure preservation of Web pages. Web archives are turning into "community memories" that aim at building a better understanding of the public view on, e.g., celebrities, court decisions, and other events. In this paper we present the ARCOMEM architecture that uses semantic information such as entities, topics, and events complemented with information from the social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-33290-6_47.German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety/0325296Solland Solar Cells BVSolarWorld Innovations GmbHSCHOTT Solar AGRENA GmbHSINGULUS TECHNOLOGIES A

    COVID-19 Community Archives and the Platformization of Digital Cultural Memory

    Get PDF
    In this study we aim to understand how GitHub is used by COVID-19 interest groups for organizing community archives to protect their knowledge from the Chinese government’s censorship efforts. We introduce two case studies of such COVID-19 community archives published with GitHub that appeared online in early 2020. Using public GitHub repository documentation and web archive web crawls from the Internet Archive’s Wayback Machine, we describe how these digital community archives emerge and exist on the platform, how knowledge of them circulated on other US based social media sites, and show strategies and tactics these volunteers used to keep these community archives alive, resist censorship, and guard the safety of these collections. We argue that these COVID-19 community archives are at risk because of their platform accessibility as much as the content they document, and that understanding how organizers use GitHub’s platform affordances is essential to theorizing how platforms are impacting approaches to preserving cultural memory
    • 

    corecore