20 research outputs found

    On-line Metasearch, Pooling, and System Evaluation

    Get PDF
    This thesis presents a unified method for simultaneous solution of three problems in Information Retrieval--- metasearch (the fusion of ranked lists returned by retrieval systems to elicit improved performance), efficient system evaluation (the accurate evaluation of retrieval systems with small numbers of relevance judgements), and pooling or ``active sample selection (the selection of documents for manual judgement in order to develop sample pools of high precision or pools suitable for assessing system quality). The thesis establishes a unified theoretical framework for addressing these three problems and naturally generalizes their solution to the on-line context by incorporating feedback in the form of relevance judgements. The algorithm--- Rankhedge for on-line retrieval, metasearch and system evaluation--- is the first to address these three problems simultaneously and also to generalize their solution to the on-line context. Optimality of the Rankhedge algorithm is developed via Bayesian and maximum entropy interpretations. Results of the algorithm prove to be significantly superior to previous methods when tested over a range of TREC (Text REtrieval Conference) data. In the absence of feedback, the technique equals or exceeds the performance of benchmark metasearch algorithms such as CombMNZ and Condorcet. The technique then dramatically improves on this performance during the on-line metasearch process. In addition, the technique generates pools of documents which include more relevant documents and produce more accurate system evaluations than previous techniques. The thesis includes an information-theoretic examination of the original Hedge algorithm as well as its adaptation to the context of ranked lists. The work also addresses the concept of information-theoretic similarity within the Rankhedge context and presents a method for decorrelating the predictor set to improve worst case performance. Finally, an information-theoretically optimal method for probabilistic ``active sampling is presented with possible application to a broad range of practical and theoretical contexts

    Department of Computer Science Activity 1998-2004

    Get PDF
    This report summarizes much of the research and teaching activity of the Department of Computer Science at Dartmouth College between late 1998 and late 2004. The material for this report was collected as part of the final report for NSF Institutional Infrastructure award EIA-9802068, which funded equipment and technical staff during that six-year period. This equipment and staff supported essentially all of the department\u27s research activity during that period

    When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections

    Get PDF
    This is the peer reviewed version of the following article: David E. Losada, Javier Parapar and Alvaro Barreiro (2019) When to Stop Making Relevance Judgments? A Study of Stopping Methods for Building Information Retrieval Test Collections. Journal of the Association for Information Science and Technology, 70 (1), 49-60, which has been published in final form at https://doi.org/10.1002/asi.24077. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived VersionsIn information retrieval evaluation, pooling is a well‐known technique to extract a sample of documents to be assessed for relevance. Given the pooled documents, a number of studies have proposed different prioritization methods to adjudicate documents for judgment. These methods follow different strategies to reduce the assessment effort. However, there is no clear guidance on how many relevance judgments are required for creating a reliable test collection. In this article we investigate and further develop methods to determine when to stop making relevance judgments. We propose a highly diversified set of stopping methods and provide a comprehensive analysis of the usefulness of the resulting test collections. Some of the stopping methods introduced here combine innovative estimates of recall with time series models used in Financial Trading. Experimental results on several representative collections show that some stopping methods can reduce up to 95% of the assessment effort and still produce a robust test collection. We demonstrate that the reduced set of judgments can be reliably employed to compare search systems using disparate effectiveness metrics such as Average Precision, NDCG, P@100, and Rank Biased Precision. With all these measures, the correlations found between full pool rankings and reduced pool rankings is very highThis work received financial support from the (i) “Ministerio de Economía y Competitividad” of the Government of Spain and FEDER Funds under the researchproject TIN2015-64282-R, (ii) Xunta de Galicia (project GPC 2016/035), and (iii) Xunta de Galicia “Consellería deCultura, Educación e Ordenación Universitaria” and theEuropean Regional Development Fund (ERDF) throughthe following 2016–2019 accreditations: ED431G/01(“Centro singular de investigación de Galicia”) andED431G/08S

    Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization

    Full text link

    Automatic performance evaluation of information retrieval systems using data fusion

    Get PDF
    Cataloged from PDF version of article.The empirical investigation of the effectiveness of information retrieval systems (search engines) requires a test collection composed of a set of documents, a set of query topics and a set of relevance judgments indicating which documents are relevant to which topics. The human relevance judgments are expensive and subjective. In addition to this databases and user interests change quickly. Hence there is a great need of automatic way of evaluating the performance of search engines. Furthermore, recent studies show that differences in human relevance assessments do not affect the relative performance of information retrieval systems. Based on these observations, in this thesis, we propose and use data fusion to replace human relevance judgments and introduce an automatic evaluation method and provide its comprehensive statistical assessment with several Text Retrieval Conference (TREC) systems which shows that the method results correlates positively and significantly with the actual human based evaluations. The major contributions of this thesis are: (1) an automatic information retrieval performance evaluation method that uses data fusion algorithms for the first time in the literature, (2) system selection methods for data fusion aiming even higher correlation among automatic and human-based results, (3) several practical implications stemming from the fact that the automatic precision values are strongly correlated with those of actual information retrieval systems.Nuray, RabiaM.S

    Filtering News from Document Streams: Evaluation Aspects and Modeled Stream Utility

    Get PDF
    Events like hurricanes, earthquakes, or accidents can impact a large number of people. Not only are people in the immediate vicinity of the event affected, but concerns about their well-being are shared by the local government and well-wishers across the world. The latest information about news events could be of use to government and aid agencies in order to make informed decisions on providing necessary support, security and relief. The general public avails of news updates via dedicated news feeds or broadcasts, and lately, via social media services like Facebook or Twitter. Retrieving the latest information about newsworthy events from the world-wide web is thus of importance to a large section of society. As new content on a multitude of topics is continuously being published on the web, specific event related information needs to be filtered from the resulting stream of documents. We present in this thesis, a user-centric evaluation measure for evaluating systems that filter news related information from document streams. Our proposed evaluation measure, Modeled Stream Utility (MSU), models users accessing information from a stream of sentences produced by a news update filtering system. The user model allows for simulating a large number of users with different characteristic stream browsing behavior. Through simulation, MSU estimates the utility of a system for an average user browsing a stream of sentences. Our results show that system performance is sensitive to a user population's stream browsing behavior and that existing evaluation metrics correspond to very specific types of user behavior. To evaluate systems that filter sentences from a document stream, we need a set of judged sentences. This judged set is a subset of all the sentences returned by all systems, and is typically constructed by pooling together the highest quality sentences, as determined by respective system assigned scores for each sentence. Sentences in the pool are manually assessed and the resulting set of judged sentences is then used to compute system performance metrics. In this thesis, we investigate the effect of including duplicates of judged sentences, into the judged set, on system performance evaluation. We also develop an alternative pooling methodology, that given the MSU user model, selects sentences for pooling based on the probability of a sentences being read by modeled users. Our research lays the foundation for interesting future work for utilizing user-models in different aspects of evaluation of stream filtering systems. The MSU measure enables incorporation of different user models. Furthermore, the applicability of MSU could be extended through calibration based on user behavior

    Increasing the Efficiency of High-Recall Information Retrieval

    Get PDF
    The goal of high-recall information retrieval (HRIR) is to find all, or nearly all, relevant documents while maintaining reasonable assessment effort. Achieving high recall is a key problem in the use of applications such as electronic discovery, systematic review, and construction of test collections for information retrieval tasks. State-of-the-art HRIR systems commonly rely on iterative relevance feedback in which human assessors continually assess machine learning-selected documents. The relevance of the assessed documents is then fed back to the machine learning model to improve its ability to select the next set of potentially relevant documents for assessment. In many instances, thousands of human assessments might be required to achieve high recall. These assessments represent the main cost of such HRIR applications. Therefore, their effectiveness in achieving high recall is limited by their reliance on human input when assessing the relevance of documents. In this thesis, we test different methods in order to improve the effectiveness and efficiency of finding relevant documents using state-of-the-art HRIR system. With regard to the effectiveness, we try to build a machine-learned model that retrieves relevant documents more accurately. For efficiency, we try to help human assessors make relevance assessments more easily and quickly via our HRIR system. Furthermore, we try to establish a stopping criteria for the assessment process so as to avoid excessive assessment. In particular, we hypothesize that total assessment effort to achieve high recall can be reduced by using shorter document excerpts (e.g., extractive summaries) in place of full documents for the assessment of relevance and using a high-recall retrieval system based on continuous active learning (CAL). In order to test this hypothesis, we implemented a high-recall retrieval system based on state-of-the-art implementation of CAL. This high-recall retrieval system could display either full documents or short document excerpts for relevance assessment. A search engine was also integrated into our system to provide assessors the option of conducting interactive search and judging. We conducted a simulation study, and separately, a 50-person controlled user study to test our hypothesis. The results of the simulation study show that judging even a single extracted sentence for relevance feedback may be adequate for CAL to achieve high recall. The results of the controlled user study confirmed that human assessors were able to find a significantly larger number of relevant documents within limited time when they used the system with paragraph-length document excerpts as opposed to full documents. In addition, we found that allowing participants to compose and execute their own search queries did not improve their ability to find relevant documents and, by some measures, impaired performance. Moreover, integrating sampling methods with active learning can yield accurate estimates of the number of relevant documents, and thus avoid excessive assessments

    Predição de relevância em sistemas de recuperação de informação

    Get PDF
    Orientador: Anderson de Rezende RochaTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: No mundo conectado atual, Recuperação de Informação (IR) tem se tornado um campo de pesquisa de crescente interesse, sendo um problema presente em muitas aplicações modernas. Dentre os muitos desafios no desenvolvimento the sistemas de IR está uma correta avaliação de performance desses sistemas. Avaliação \emph{offline}, entretanto, se limita na maioria dos casos ao \emph{benchamark} e comparação de performance entre diferentes sistemas. Esse fato levou ao surgimento do problema denomidado Predição de Performance de Consulta (QPP), cujo objetivo é estimar, em tempo de consulta, a qualidade dos resultados obtidos. Nos últimos anos, QPP recebeu grande atenção na literatura, sobretudo no contexto de busca textual. Ainda assim, QPP também tem suas limitações, principalmente por ser uma maneira indireta de estimar a performance de sistemas de IR. Nessa tese, investigamos formular o problema de QPP como um problema de \emph{predição de relevância}: a tarefa de predizer, para um determinado \topk, quais resultados de uma consulta são de fato relevantes para ela, de acordo com uma referência de relevância existente. Apesar de notavelmente desafiador, predição de relevância é não só uma maneira mais natural de estimar performance, como também com diversas aplicações. Nessa tese, apresentamos três famílias de métodos de predição de relevância: estatísticos, aprendizado, e rotulação sequencial. Todos os métodos nessas famílias tiveram sua efetividade avaliada em diversos experimentos em recuperação de imagens por conteúdo, cobrindo uma vasta gama de conjuntos de dados de grande-escala, assim como diferentes configurações de recuperação. Mostramos que é possível gerar predições de relevância precisas, para grandes valores de kk, não só connhecendo pouco do sistema de IR analisado, como também de forma eficiente o bastante para ser aplicável em tempo de consulta. Finalizamos esta tese discutindo alguns caminhos possíveis para melhorar os resultados obtidos, assim como trabalhos futuros nesse campo de pesquisaAbstract: In today¿s connected world, Information Retrieval (IR) has become one of the most ubiquitous problems, being part of many modern applications. Among all challenges in designing IR systems, how to evaluate their performance is ever-present. Offline evaluation, however, is mostly limited to benchmarking and comparison of different systems, which has pushed a growing interest in predicting, at query time, the performance of an IR system. Query Performance Prediction (QPP) is the name given to the problem of estimating the quality of results retrieved by an IR system in response to a query. In the past few years, this problem received much attention, especially by the text retrieval community. Yet, QPP is still limited as only an indirect way of estimating the performance of IR systems. In this thesis, we investigate formulating the QPP problem as a \emph{relevance prediction} one: the task of predicting, for a specific \topk, which results of a query are relevant to it, according to some existing relevance reference. Though remarkably challenging, relevance prediction is not only a more natural way of predicting performance but also one with significantly more applications. In this thesis, we present three families of relevance prediction approaches: statistical, learning, and sequence labeling. All methods within those families are evaluated concerning their effectiveness in several content-based image retrieval experiments, covering several large-scale datasets and retrieval settings. The experiments in this thesis show that it is feasible to perform relevance prediction for kk values as large as 30, with minimal information about the underlying IR system, and efficiently enough to be performed at query time. This thesis is concluded by offering some potential paths for improving the current results, as well as future research in this particular fieldDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação168326/2017-5CAPESCNP

    Digital Media and Textuality: From Creation to Archiving

    Get PDF
    Due to computers' ability to combine different semiotic modes, texts are no longer exclusively comprised of static images and mute words. How have digital media changed the way we write and read? What methods of textual and data analysis have emerged? How do we rescue digital artifacts from obsolescence? And how can digital media be used or taught inside classrooms? These and other questions are addressed in this volume that assembles contributions by artists, writers, scholars and editors. They offer a multiperspectival view on the way digital media have changed our notion of textuality

    Digital Media and Textuality

    Get PDF
    Due to computers' ability to combine different semiotic modes, texts are no longer exclusively comprised of static images and mute words. How have digital media changed the way we write and read? What methods of textual and data analysis have emerged? How do we rescue digital artifacts from obsolescence? And how can digital media be used or taught inside classrooms? These and other questions are addressed in this volume that assembles contributions by artists, writers, scholars and editors such as Dene Grigar, Sandy Baldwin, Carlos Reis, and Frieder Nake. They offer a multiperspectival view on the way digital media have changed our notion of textuality
    corecore