94 research outputs found

    Refinding is Not Finding Again

    Get PDF
    A challenging problem for Internet users today is how to refind information that they have seen before. We believe that finding and refinding are different user activities and require different types of support. The problem of how to find information on the web is studied extensively---new search algorithms, support for natural language queries, and innovative document indexing techniques are common topics in information retrieval research; visualizations of documents, and task support for finding are topics in human-computer interaction. But refinding has only recently begun to receive attention. In this article, we present evidence to support the claim that information refinding is a different activity than information finding. We present results that show how refinding is different from finding and suggest ways to improve web information seeking tools and designs tosupport refinding information

    Scraping SERPs for Archival Seeds: It Matters When You Start

    Full text link
    Event-based collections are often started with a web search, but the search results you find on Day 1 may not be the same as those you find on Day 7. In this paper, we consider collections that originate from extracting URIs (Uniform Resource Identifiers) from Search Engine Result Pages (SERPs). Specifically, we seek to provide insight about the retrievability of URIs of news stories found on Google, and to answer two main questions: first, can one "refind" the same URI of a news story (for the same query) from Google after a given time? Second, what is the probability of finding a story on Google over a given period of time? To answer these questions, we issued seven queries to Google every day for over seven months (2017-05-25 to 2018-01-12) and collected links from the first five SERPs to generate seven collections for each query. The queries represent public interest stories: "healthcare bill," "manchester bombing," "london terrorism," "trump russia," "travel ban," "hurricane harvey," and "hurricane irma." We tracked each URI in all collections over time to estimate the discoverability of URIs from the first five SERPs. Our results showed that the daily average rate at which stories were replaced on the default Google SERP ranged from 0.21 -0.54, and a weekly rate of 0.39 - 0.79, suggesting the fast replacement of older stories by newer stories. The probability of finding the same URI of a news story after one day from the initial appearance on the SERP ranged from 0.34 - 0.44. After a week, the probability of finding the same news stories diminishes rapidly to 0.01 - 0.11. Our findings suggest that due to the difficulty in retrieving the URIs of news stories from Google, collection building that originates from search engines should begin as soon as possible in order to capture the first stages of events, and should persist in order to capture the evolution of the events...Comment: This is an extended version of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) full paper: https://doi.org/10.1145/3197026.3197056. Some of the figure numbers have change

    A Special Topics Course on Personal Information Management

    Get PDF
    Personal Information Management (PIM) is an important emerg- ing area of study in Computer Science and Information Systems. During the Spring of 2006, we offered a special topics course in PIM at Virginia Tech. This paper presents some motivation of why studying PIM is important, the goals for the course, some sam- ple material from the course, and a few student evaluations. The paper presents in detail an activity called ā€œDay in the Life of My Informationā€ that resulted in an interesting experience from both, educational and research points of view

    Creating a data collection for evaluating rich speech retrieval

    Get PDF
    We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items

    What makes re-finding information difficult? A study of email re-finding

    Get PDF
    Re-nding information that has been seen or accessed before is a task which can be relatively straight-forward, but often it can be extremely challenging, time-consuming and frustrating. Little is known, however, about what makes one re-finding task harder or easier than another. We performed a user study to learn about the contextual factors that influence users' perception of task diculty in the context of re-finding email messages. 21 participants were issued re-nding tasks to perform on their own personal collections. The participants' responses to questions about the tasks combined with demographic data and collection statistics for the experimental population provide a rich basis to investigate the variables that can influence the perception of diculty. A logistic regression model was developed to examine the relationships be- tween variables and determine whether any factors were associated with perceived task diculty. The model reveals strong relationships between diculty and the time lapsed since a message was read, remembering when the sought-after email was sent, remembering other recipients of the email, the experience of the user and the user's ling strategy. We discuss what these findings mean for the design of re-nding interfaces and future re-finding research

    Searching with Tags: Do Tags Help Users Find Things?

    Get PDF
    This study examines the question of whether tags can be useful in the process of information retrieval. Participants searched a social bookmarking tool specialising in academic articles (CiteULike) and an online journal database (Pubmed). Participant actions were captured using screen capture software and they were asked to describe their search process. Users did make use of tags in their search process, as a guide to searching and as hyperlinks to potentially useful articles. However, users also made use of controlled vocabularies in the journal database to locate useful search terms and of links to related articles supplied by the database

    Building and exploiting context on the web

    Get PDF
    [no abstract

    Towards task-based personal information management evaluations

    Get PDF
    Personal Information Management (PIM) is a rapidly growing area of research concerned with how people store, manage and re-find information. A feature of PIM research is that many systems have been designed to assist users manage and re-find information, but very few have been evaluated.This has been noted by several scholars and explained by the difficulties involved in performing PIM evaluations.The difficulties include that people re-find information from within unique personal collections; researchers know little about the tasks that cause people to re-find information; and numerous privacy issues concerning personal information. In this paper we aim to facilitate PIM evaluations by addressing each of these difficulties. In the first part, we present a diary study of information re-finding tasks. The study examines the kind of tasks that require users to re-find information and produces a taxonomy of re-finding tasks for email messages and web pages. In the second part, we propose a task-based evaluation methodology based on our findings and examine the feasibility of the approach using two different methods of task creation
    • ā€¦
    corecore