61 research outputs found

    An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

    Full text link
    Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular scenarios. In this paper, we define a constraint-based axiomatic framework to study the suitability of existing metrics in search result diversification scenarios. The analysis informed the definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known Rank-Biased Precision metric -- that takes into account redundancy and the user effort associated to the inspection of documents in the ranking. Our experiments over standard diversity evaluation campaigns show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study.Comment: Original version: 10 pages. Preprint of full paper to appear at SIGIR'18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, July 8-12, 2018, Ann Arbor, MI, USA. ACM, New York, NY, US

    Contextual question answering for the health domain

    Get PDF
    Studies have shown that natural language interfaces such as question answering and conversational systems allow information to be accessed and understood more easily by users who are unfamiliar with the nuances of the delivery mechanisms (e.g., keyword-based search engines) or have limited literacy in certain domains (e.g., unable to comprehend health-related content due to terminology barrier). In particular, the increasing use of the web for health information prompts us to reexamine our existing delivery mechanisms. We present enquireMe, which is a contextual question answering system that provides lay users with the ability to obtain responses about a wide range of health topics by vaguely expressing at the start and gradually refining their information needs over the course of an interaction session using natural language. enquireMe allows the users to engage in 'conversations' about their health concerns, a process that can be therapeutic in itself. The system uses community-driven question-answer pairs from the web together with a decay model to deliver the top scoring answers as responses to the users' unrestricted inputs. We evaluated enquireMe using benchmark data from WebMD and TREC to assess the accuracy of system-generated answers. Despite the absence of complex knowledge acquisition and deep language processing, enquireMe is comparable to the state-of-the-art question answering systems such as START as well as those interactive systems from TREC

    The history of information retrieval research

    Get PDF
    This paper describes a brief history of the research and development of information retrieval systems starting with the creation of electromechanical searching devices, through to the early adoption of computers to search for items that are relevant to a user's query. The advances achieved by information retrieval researchers from the 1950s through to the present day are detailed next, focusing on the process of locating relevant information. The paper closes with speculation on where the future of information retrieval lies

    Evaluating multi-query sessions

    Full text link

    Predicting re-finding activity and difficulty

    Get PDF
    In this study, we address the problem of identifying if users are attempting to re-find information and estimating the level of difficulty of the re- finding task. We propose to consider the task information (e.g. multiple queries and click information) rather than only queries. Our resultant prediction models are shown to be significantly more accurate (by 2%) than the current state of the art. While past research assumes that previous search history of the user is available to the prediction model, we examine if re-finding detection is possible without access to this information. Our evaluation indicates that such detection is possible, but more challenging. We further describe the first predictive model in detecting re-finding difficulty, showing it to be significantly better than existing approaches for detecting general search difficulty

    Recherche d'information dynamique pour domaines complexes

    Get PDF
    Dans ce mémoire, nous traitons du sujet de la recherche d’information dynamique en milieu complexe. Celle-ci a pour but d’inclure l’utilisateur dans la boucle. Ainsi, l’utilisateur a la possibilité d’interagir avec le système en surlignant les passages pertinents et en indiquant le degré d’importance selon ses intérêts. Dans le domaine de la recherche d’information, les milieux complexes peuvent être définis comme des corpus de textes au sein desquels il est difficile de trouver une information à partir d’une requête générale. Par exemple, si l’utilisateur effectuait une recherche sur les impacts du virus Ebola durant la crise en Afrique en 2014-2015, il pourrait être intéressé par différents aspects liés à ce virus (économiques, de santé publique, etc.). Notre objectif est de modéliser ces différents aspects et de diversifier les documents présentés, afin de couvrir le maximum de ses intérêts. Dans ce mémoire, nous explorons différentes méthodes de diversification des résultats. Nous réalisons une étude de l’impact des entités nommées et des mots-clés contenus dans les passages issus du retour de l’utilisateur afin de créer une nouvelle requête qui affine la recherche initiale de l’utilisateur en trouvant les mots les plus pertinents par rapport à ce qu’il aura surligné. Comme l’interaction se base uniquement sur la connaissance acquise durant la recherche et celle-ci étant courte, puisque l’utilisateur ne souhaite pas une longue phase d’annotation, nous avons choisi de modéliser le corpus en amont, via les « word embeddings » ou plongements lexicaux, ce qui permet de contextualiser les mots et d’étendre les recherches à des mots similaires à notre requête initiale. Une approche de recherche dynamique doit, en outre, être capable de trouver un point d’arrêt. Ce point d’arrêt doit amener un équilibre entre trop peu et trop plein d’information, afin de trouver un bon compromis entre pertinence et couverture des intérêts

    QWERTY: The effects of typing on web search behavior

    Get PDF
    Typing is a common form of query input for search engines and other information retrieval systems; we therefore investigate the relationship between typing behavior and search interactions. The search process is interactive and typically requires entering one or more queries, and assessing both summaries from Search Engine Result Pages and the underlying documents, to ultimately satisfy some information need. Under the Search Economic Theory model of interactive information retrieval, differences in query costs will result in search behavior changes. We investigate how differences in query inputs themselves may relate to Search Economic Theory by conducting a lab-based experiment to observe how text entries influence subsequent search interactions. Our results indicate that for faster typing speeds, more queries are entered in a session, while both query lengths and assessment times are lower

    Effective summarisation for search engines

    Get PDF
    Users of information retrieval (IR) systems issue queries to find information in large collections of documents. Nearly all IR systems return answers in the form of a list of results, where each entry typically consists of the title of the underlying document, a link, and a short query-biased summary of a document's content called a snippet. As retrieval systems typically return a mixture of relevant and non-relevant answers, the role of the snippet is to guide users to identify those documents that are likely to be good answers and to ignore those that are less useful. This thesis focuses on techniques to improve the generation and evaluation of query-biased summaries for informational requests, where users typically need to inspect several documents to fulfil their information needs. We investigate the following issues: how users construct query-biased summaries, and how this compares with current automatic summarisation methods; how query expansion can be applied to sentence-level ranking to improve the quality of query-biased summaries; and, how to evaluate these summarisation approaches using sentence-level relevance data. First, through an eye tracking study, we investigate the way in which users select information from documents when they are asked to construct a query-biased summary in response to a given search request. Our analysis indicates that user behaviour differs from the assumptions of current state-of-the-art query-biased summarisation approaches. A major cause of difference resulted from vocabulary mismatch, a common IR problem. This thesis then examines query expansion techniques to improve the selection of candidate relevant sentences, and to reduce the vocabulary mismatch observed in the previous study. We employ a Cranfield-based methodology to quantitatively assess sentence ranking methods based on sentence-level relevance assessments available in the TREC Novelty track, in line with previous work. We study two aspects of sentence-level evaluation of this track. First, whether sentences that have been judged based on relevance, as in the TREC Novelty track, can also be considered to be indicative; that is, useful in terms of being part of a query-biased summary and guiding users to make correct document selections. By conducting a crowdsourcing experiment, we find that relevance and indicativeness agree around 73% of the time. Second, during our evaluations we discovered a bias that longer sentences were more likely to be judged as relevant. We then propose a novel evaluation of sentence ranking methods, which aims to isolate the sentence length bias. Using our enhanced evaluation method, we find that query expansion can effectively assist in the selection of short sentences. We conclude our investigation with a second study to examine the effectiveness of query expansion in query-biased summarisation methods to end users. Our results indicate that participants significantly tend to prefer query-biased summaries aided through expansion techniques approximately 60% of the time, for query-biased summaries comprised of short and middle length sentences. We suggest that our findings can inform the generation and display of query-biased summaries of IR systems such as search engines

    Crowdsourcing Relevance: Two Studies on Assessment

    Get PDF
    Crowdsourcing has become an alternative approach to collect relevance judgments at large scale. In this thesis, we focus on some specific aspects related to time, scale, and agreement. First, we address the issue of the time factor in gathering relevance label: we study how much time the judges need to assess documents. We conduct a series of four experiments which unexpectedly reveal us how introducing time limitations leads to benefits in terms of the quality of the results. Furthermore, we discuss strategies aimed to determine the right amount of time to make available to the workers for the relevance assessment, in order to both guarantee the high quality of the gathered results and the saving of the valuable resources of time and money. Then we explore the application of magnitude estimation, a psychophysical scaling technique for the measurement of sensation, for relevance assessment. We conduct a large-scale user study across 18 TREC topics, collecting more than 50,000 magnitude estimation judgments, which result to be overall rank-aligned with ordinal judgments made by expert relevance assessors. We discuss the benefits, the reliability of the judgements collected, and the competitiveness in terms of assessor cost. We also report some preliminary results on the agreement among judges. Often, the results of crowdsourcing experiments are affected by noise, that can be ascribed to lack of agreement among workers. This aspect should be considered as it can affect the reliability of the gathered relevance labels, as well as the overall repeatability of the experiments.openDottorato di ricerca in Informatica e scienze matematiche e fisicheopenMaddalena, Edd
    • …
    corecore