45,830 research outputs found
Information access tasks and evaluation for personal lifelogs
Emerging personal lifelog (PL) collections contain permanent digital records of information associated with individuals’ daily lives. This can include materials such as emails received and sent, web content and other documents with which they have interacted, photographs, videos and music experienced passively or created, logs of phone calls and text messages, and also personal and contextual data such as location (e.g. via GPS sensors), persons and objects present (e.g. via Bluetooth) and physiological state (e.g. via biometric sensors). PLs can be collected by individuals over very extended periods, potentially running to many years. Such archives have many potential applications including helping individuals recover partial forgotten information, sharing experiences with friends or family, telling the story of one’s life, clinical applications for the memory impaired, and fundamental psychological investigations of memory. The Centre for Digital Video Processing (CDVP) at Dublin City University is currently engaged in the collection and exploration of applications of large PLs. We are collecting rich archives of daily life including textual and visual materials, and contextual context data. An important part of this work is to consider how the effectiveness of our ideas can be measured in terms of metrics and experimental design. While these studies have considerable similarity with traditional evaluation activities in areas such as information retrieval and summarization, the characteristics of PLs mean that new challenges and questions emerge. We are currently exploring the issues through a series of pilot studies and questionnaires. Our initial results indicate that there are many research questions to be explored and that the relationships between personal memory, context and content for these tasks is complex and fascinating
Applying contextual memory cues for retrieval from personal information archives
Advances in digital technologies for information capture
combined with massive increases in the capacity of digital
storage media mean that it is now possible to capture and store one’s entire life experiences in a Human Digital Memory (HDM). Information can be captured from a myriad of personal information devices including desktop computers, PDAs, digital cameras, video and audio recorders, and various sensors, including GPS, Bluetooth, and biometric devices. These diverse collections of personal information are potentially very valuable, but will only be so if significant information can be reliably retrieved from them. HDMs differ from traditional document collections for which existing search technologies have been developed since users may have poor recollection of contents or even the existence of stored items. Additionally HDM data is highly heterogeneous and unstructured, making it difficult to form search queries. We believe that a Personal Information Management (PIM) system which exploits the context of information capture, and potentially of earlier refinding, can be valuable in effective retrieval from an
HDM. We report an investigation into how individuals
perform searches of their personal information, and use
the outcome of this study to develop an information retrieval (IR) framework for HDM search incorporating the context of document capture. We then describe the creation of a pilot HDM test collection, and initial experiments in retrieval from this collection. Results from these experiments indicate that use of context data can be significantly beneficial to increasing the efficient retrieval of partially recalled items from an HDM
Using semantic indexing to improve searching performance in web archives
The sheer volume of electronic documents being published on the Web can be overwhelming for users if the searching aspect is not properly addressed. This problem is particularly acute inside archives and repositories containing large collections of web resources or, more precisely, web pages and other web objects. Using the existing search capabilities in web archives, results can be compromised because of the size of data, content heterogeneity and changes in scientific terminologies and meanings. During the course of this research, we will explore whether semantic web technologies, particularly ontology-based annotation and retrieval, could improve precision in search results in multi-disciplinary web archives
We Could, but Should We? Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections
We live in an era in which the ways that we can make sense of our past are evolving as more artifacts from that past become digital. At the same time, the responsibilities of traditional gatekeepers who have negotiated the ethics of historical data collection and use, such as librarians and archivists, are increasingly being sidelined by the system builders who decide whether and how to provide access to historical digital collections, often without sufficient reflection on the ethical issues at hand. It is our aim to better prepare system builders to grapple with these issues. This paper focuses discussions around one such digital collection from the dawn of the web, asking what sorts of analyses can and should be conducted on archival copies of the GeoCities web hosting platform that dates to 1994.This research was supported by the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the US National Science Foundation (grants 1618695 and 1704369), the Andrew W. Mellon Foundation, Start Smart Labs, and Compute Canada
BlogForever D5.2: Implementation of Case Studies
This document presents the internal and external testing results for the BlogForever case studies. The evaluation of the BlogForever implementation process is tabulated under the most relevant themes and aspects obtained within the testing processes. The case studies provide relevant feedback for the sustainability of the platform in terms of potential users’ needs and relevant information on the possible long term impact
Automatic text searching for personal photos
This demonstration presents the MediAssist prototype system for organisation of personal digital photo collections based on contextual information, such as time and location of image capture, and content-based analysis, such as face detection and recognition. This metadata is used directly for identification of photos which match specified attributes, and also to create text surrogates for photos, allowing for text-based queries of photo collections without relying on manual annotation. MediAssist illustrates our research into digital photo management, showing how a combination of automatically extracted context and content-based information, together with user annotation and
traditional text indexing techniques, facilitates efficient searching of personal photo collections
Automatic tagging and geotagging in video collections and communities
Automatically generated tags and geotags hold great promise
to improve access to video collections and online communi-
ties. We overview three tasks offered in the MediaEval 2010
benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features
Venturing into the labyrinth: the information retrieval challenge of human digital memories
Advances in digital capture and storage technologies mean
that it is now possible to capture and store one’s entire life experiences in a Human Digital Memory (HDM). However,
these vast personal archives are of little benefit if an individual cannot locate and retrieve significant items from
them. While potentially offering exciting opportunities to
support a user in their activities by providing access to information stored from previous experiences, we believe that the features of HDM datasets present new research challenges for information retrieval which must be addressed if these possibilities are to be realised. Specifically we postulate that effective retrieval from HDMs must exploit the rich sources of context data which can be captured and associated with items stored within them. User’s memories
of experiences stored within their memory archive will often
be linked to these context features. We suggest how such
contextual metadata can be exploited within the retrieval
process
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
- …