45,804 research outputs found

    Information access tasks and evaluation for personal lifelogs

    Get PDF
    Emerging personal lifelog (PL) collections contain permanent digital records of information associated with individuals’ daily lives. This can include materials such as emails received and sent, web content and other documents with which they have interacted, photographs, videos and music experienced passively or created, logs of phone calls and text messages, and also personal and contextual data such as location (e.g. via GPS sensors), persons and objects present (e.g. via Bluetooth) and physiological state (e.g. via biometric sensors). PLs can be collected by individuals over very extended periods, potentially running to many years. Such archives have many potential applications including helping individuals recover partial forgotten information, sharing experiences with friends or family, telling the story of one’s life, clinical applications for the memory impaired, and fundamental psychological investigations of memory. The Centre for Digital Video Processing (CDVP) at Dublin City University is currently engaged in the collection and exploration of applications of large PLs. We are collecting rich archives of daily life including textual and visual materials, and contextual context data. An important part of this work is to consider how the effectiveness of our ideas can be measured in terms of metrics and experimental design. While these studies have considerable similarity with traditional evaluation activities in areas such as information retrieval and summarization, the characteristics of PLs mean that new challenges and questions emerge. We are currently exploring the issues through a series of pilot studies and questionnaires. Our initial results indicate that there are many research questions to be explored and that the relationships between personal memory, context and content for these tasks is complex and fascinating

    Applying contextual memory cues for retrieval from personal information archives

    Get PDF
    Advances in digital technologies for information capture combined with massive increases in the capacity of digital storage media mean that it is now possible to capture and store one’s entire life experiences in a Human Digital Memory (HDM). Information can be captured from a myriad of personal information devices including desktop computers, PDAs, digital cameras, video and audio recorders, and various sensors, including GPS, Bluetooth, and biometric devices. These diverse collections of personal information are potentially very valuable, but will only be so if significant information can be reliably retrieved from them. HDMs differ from traditional document collections for which existing search technologies have been developed since users may have poor recollection of contents or even the existence of stored items. Additionally HDM data is highly heterogeneous and unstructured, making it difficult to form search queries. We believe that a Personal Information Management (PIM) system which exploits the context of information capture, and potentially of earlier refinding, can be valuable in effective retrieval from an HDM. We report an investigation into how individuals perform searches of their personal information, and use the outcome of this study to develop an information retrieval (IR) framework for HDM search incorporating the context of document capture. We then describe the creation of a pilot HDM test collection, and initial experiments in retrieval from this collection. Results from these experiments indicate that use of context data can be significantly beneficial to increasing the efficient retrieval of partially recalled items from an HDM

    Using semantic indexing to improve searching performance in web archives

    Get PDF
    The sheer volume of electronic documents being published on the Web can be overwhelming for users if the searching aspect is not properly addressed. This problem is particularly acute inside archives and repositories containing large collections of web resources or, more precisely, web pages and other web objects. Using the existing search capabilities in web archives, results can be compromised because of the size of data, content heterogeneity and changes in scientific terminologies and meanings. During the course of this research, we will explore whether semantic web technologies, particularly ontology-based annotation and retrieval, could improve precision in search results in multi-disciplinary web archives

    We Could, but Should We? Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections

    Get PDF
    We live in an era in which the ways that we can make sense of our past are evolving as more artifacts from that past become digital. At the same time, the responsibilities of traditional gatekeepers who have negotiated the ethics of historical data collection and use, such as librarians and archivists, are increasingly being sidelined by the system builders who decide whether and how to provide access to historical digital collections, often without sufficient reflection on the ethical issues at hand. It is our aim to better prepare system builders to grapple with these issues. This paper focuses discussions around one such digital collection from the dawn of the web, asking what sorts of analyses can and should be conducted on archival copies of the GeoCities web hosting platform that dates to 1994.This research was supported by the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the US National Science Foundation (grants 1618695 and 1704369), the Andrew W. Mellon Foundation, Start Smart Labs, and Compute Canada

    BlogForever D5.2: Implementation of Case Studies

    Get PDF
    This document presents the internal and external testing results for the BlogForever case studies. The evaluation of the BlogForever implementation process is tabulated under the most relevant themes and aspects obtained within the testing processes. The case studies provide relevant feedback for the sustainability of the platform in terms of potential users’ needs and relevant information on the possible long term impact

    Automatic text searching for personal photos

    Get PDF
    This demonstration presents the MediAssist prototype system for organisation of personal digital photo collections based on contextual information, such as time and location of image capture, and content-based analysis, such as face detection and recognition. This metadata is used directly for identification of photos which match specified attributes, and also to create text surrogates for photos, allowing for text-based queries of photo collections without relying on manual annotation. MediAssist illustrates our research into digital photo management, showing how a combination of automatically extracted context and content-based information, together with user annotation and traditional text indexing techniques, facilitates efficient searching of personal photo collections

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    Venturing into the labyrinth: the information retrieval challenge of human digital memories

    Get PDF
    Advances in digital capture and storage technologies mean that it is now possible to capture and store one’s entire life experiences in a Human Digital Memory (HDM). However, these vast personal archives are of little benefit if an individual cannot locate and retrieve significant items from them. While potentially offering exciting opportunities to support a user in their activities by providing access to information stored from previous experiences, we believe that the features of HDM datasets present new research challenges for information retrieval which must be addressed if these possibilities are to be realised. Specifically we postulate that effective retrieval from HDMs must exploit the rich sources of context data which can be captured and associated with items stored within them. User’s memories of experiences stored within their memory archive will often be linked to these context features. We suggest how such contextual metadata can be exploited within the retrieval process

    Reply With: Proactive Recommendation of Email Attachments

    Full text link
    Email responses often contain items-such as a file or a hyperlink to an external document-that are attached to or included inline in the body of the message. Analysis of an enterprise email corpus reveals that 35% of the time when users include these items as part of their response, the attachable item is already present in their inbox or sent folder. A modern email client can proactively retrieve relevant attachable items from the user's past emails based on the context of the current conversation, and recommend them for inclusion, to reduce the time and effort involved in composing the response. In this paper, we propose a weakly supervised learning framework for recommending attachable items to the user. As email search systems are commonly available, we constrain the recommendation task to formulating effective search queries from the context of the conversations. The query is submitted to an existing IR system to retrieve relevant items for attachment. We also present a novel strategy for generating labels from an email corpus---without the need for manual annotations---that can be used to train and evaluate the query formulation model. In addition, we describe a deep convolutional neural network that demonstrates satisfactory performance on this query formulation task when evaluated on the publicly available Avocado dataset and a proprietary dataset of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on Information and Knowledge Management. 201
    corecore