15 research outputs found

    A competitive environment for exploratory query expansion

    Get PDF
    Most information workers query digital libraries many times a day. Yet people have little opportunity to hone their skills in a controlled environment, or compare their performance with others in an objective way. Conversely, although search engine logs record how users evolve queries, they lack crucial information about the user's intent. This paper describes an environment for exploratory query expansion that pits users against each other and lets them compete, and practice, in their own time and on their own workstation. The system captures query evolution behavior on predetermined information-seeking tasks. It is publicly available, and the code is open source so that others can set up their own competitive environments

    Towards quantifying the impact of non-uniform information access in collaborative information retrieval

    Get PDF
    The majority of research into Collaborative Information Retrieval (CIR) has assumed a uniformity of information access and visibility between collaborators. However in a number of real world scenarios, information access is not uniform between all collaborators in a team e.g. security, health etc. This can be referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, there has not yet been any systematic investigation of the effect of MLCIR on search outcomes. To address this shortcoming, in this paper, we present the results of a simulated evaluation conducted over 4 different non-uniform information access scenarios and 3 different collaborative search strategies. Results indicate that there is some tolerance to removing access to the collection and that there may not always be a negative impact on performance. We also highlight how different access scenarios and search strategies impact on search outcomes

    A graph based approach to estimating lexical cohesion

    Get PDF
    Traditionally, information retrieval systems rank documents according to the query terms they contain. However, even if a document may contain all query terms, this does not guarantee that it is relevant to the query. The query terms can occur together in the same document, but may have been used in different contexts, expressing separate topics. Lexical cohesion is a characteristic of natural language texts, which can be used to determine whether the query terms are used in the same context in the document. In this paper we make use of a graph-based approach to capture term contexts and estimate the level of lexical cohesion in a document. To evaluate the performance of our system, we compare it against two benchmark systems using three TREC document collections. Copyright 2008 ACM

    Lexical cohesion and term proximity in document ranking

    Get PDF
    Cataloged from PDF version of article.We demonstrate effective new methods of document ranking based on lexical cohesive relationships between query terms. The proposed methods rely solely on the lexical relationships between original query terms, and do not involve query expansion or relevance feedback. Two types of lexical cohesive relationship information between query terms are used in document ranking: short-distance collocation relationship between query terms, and long-distance relationship, determined by the collocation of query terms with other words. The methods are evaluated on TREC corpora, and show improvements over baseline systems. (C) 2008 Elsevier Ltd. All rights reserved

    INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

    Get PDF
    Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

    Beyond Traditional Collaborative Search: Understanding the Effect of Awareness on Multi-Level Collaborative Information Retrieval

    Get PDF
    Although there has been a great deal of research into Collaborative Information Retrieval (CIR) and Collaborative Information Seeking (CIS), the majority has assumed that team members have the same level of unrestricted access to underlying information. However, observations from different domains (e.g. healthcare, business, etc.) have suggested that collaboration sometimes involves people with differing levels of access to underlying information. This type of scenario has been referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, no studies have been conducted to investigate the effect of awareness, an existing CIR/CIS concept, on MLCIR. To address this gap in current knowledge, we conducted two separate user studies using a total of 5 different collaborative search interfaces and 3 information access scenarios. A number of Information Retrieval (IR), CIS and CIR evaluation metrics, as well as questionnaires were used to compare the interfaces. Design interviews were also conducted after evaluations to obtain qualitative feedback from participants. Results suggested that query properties such as time spent on query, query popularity and query effectiveness could allow users to obtain information about team’s search performance and implicitly suggest better queries without disclosing sensitive data. Besides, having access to a history of intersecting viewed, relevant and bookmarked documents could provide similar positive effect as query properties. Also, it was found that being able to easily identify different team members and their actions is important for users in MLCIR. Based on our findings, we provide important design recommendations to help develop new CIR and MLCIR interfaces

    Mining Meaning from Wikipedia

    Get PDF
    Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.Comment: An extensive survey of re-using information in Wikipedia in natural language processing, information retrieval and extraction and ontology building. Accepted for publication in International Journal of Human-Computer Studie

    Veebi otsingumootorid ja vajadus keeruka informatsiooni järele

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Veebi otsingumootorid on muutunud põhiliseks teabe hankimise vahenditeks internetist. Koos otsingumootorite kasvava populaarsusega on nende kasutusala kasvanud lihtsailt päringuilt vajaduseni küllaltki keeruka informatsiooni otsingu järele. Samas on ka akadeemiline huvi otsingu vastu hakanud liikuma lihtpäringute analüüsilt märksa keerukamate tegevuste suunas, mis hõlmavad ka pikemaid ajaraame. Praegused otsinguvahendid ei toeta selliseid tegevusi niivõrd hästi nagu lihtpäringute juhtu. Eriti kehtib see toe osas koondada mitme päringu tulemusi kokku sünteesides erinevate lihtotsingute tulemusi ühte uude dokumenti. Selline lähenemine on alles algfaasis ja ning motiveerib uurijaid arendama vastavaid vahendeid toetamaks taolisi informatsiooniotsingu ülesandeid. Käesolevas dissertatsioonis esitatakse rida uurimistulemusi eesmärgiga muuta keeruliste otsingute tuge paremaks kasutades tänapäevaseid otsingumootoreid. Alameesmärkideks olid: (a) arendada välja keeruliste otsingute mudel, (b) mõõdikute loomine kompleksotsingute mudelile, (c) eristada kompleksotsingu ülesandeid lihtotsingutest ning teha kindlaks, kas neid on võimalik mõõta leides ühtlasi lihtsaid mõõdikuid kirjeldamaks nende keerukust, (d) analüüsida, kui erinevalt kasutajad käituvad sooritades keerukaid otsinguülesandeid kasutades veebi otsingumootoreid, (e) uurida korrelatsiooni inimeste tava-veebikasutustavade ja nende otsingutulemuslikkuse vahel, (f) kuidas inimestel läheb eelhinnates otsinguülesande raskusastet ja vajaminevat jõupingutust ning (g) milline on soo ja vanuse mõju otsingu tulemuslikkusele. Keeruka veebiotsingu ülesanded jaotatakse edukalt kolmeastmeliseks protsessiks. Esitatakse sellise protsessi mudel; seda protsessi on ühtlasi võimalik ka mõõta. Edasi näidatakse kompleksotsingu loomupäraseid omadusi, mis teevad selle eristatavaks lihtsamatest juhtudest ning näidatakse ära katsemeetod sooritamaks kompleksotsingu kasutaja-uuringuid. Demonstreeritakse põhilisi samme raamistiku “Search-Logger” (eelmainitud metodoloogia tehnilise teostuse) rakendamisel kasutaja-uuringutes. Esitatakse sellisel viisil teostatud uuringute tulemused. Lõpuks esitatakse ATMS meetodi realisatsioon ja rakendamine parandamaks kompleksotsingu vajaduste tuge kaasaegsetes otsingumootorites.Search engines have become the means for searching information on the Internet. Along with the increasing popularity of these search tools, the areas of their application have grown from simple look-up to rather complex information needs. Also the academic interest in search has started to shift from analyzing simple query and response patterns to examining more sophisticated activities covering longer time spans. Current search tools do not support those activities as well as they do in the case of simple look-up tasks. Especially the support for aggregating search results from multiple search-queries, taking into account discoveries made and synthesizing them into a newly compiled document is only at the beginning and motivates researchers to develop new tools for supporting those information seeking tasks. In this dissertation I present the results of empirical research with the focus on evaluating search engines and developing a theoretical model of the complex search process that can be used to better support this special kind of search with existing search tools. It is not the goal of the thesis to implement a new search technology. Therefore performance benchmarks against established systems such as question answering systems are not part of this thesis. I present a model that decomposes complex Web search tasks into a measurable, three-step process. I show the innate characteristics of complex search tasks that make them distinguishable from their less complex counterparts and showcase an experimentation method to carry out complex search related user studies. I demonstrate the main steps taken during the development and implementation of the Search-Logger study framework (the technical manifestation of the aforementioned method) to carry our search user studies. I present the results of user studies carried out with this approach. Finally I present development and application of the ATMS (awareness-task-monitor-share) model to improve the support for complex search needs in current Web search engines
    corecore