10 research outputs found

    A Comparison of Retrieval Models using Term Dependencies

    Full text link

    Automatic methods for low-cost evaluation and position-aware models for neural information retrieval

    Get PDF
    An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.Ein Information-Retrieval (IR) System hilft Menschen bei der Arbeit mit großen Datenmengen, daher ist die Entwicklung und Evaluation solcher Systeme wichtig. Allerdings gibt es zwei Herausforderungen: die große Anzahl von Anfrage-Dokument-Paaren, die manuelle IREvaluation schwierig macht; sowie die komplizierten zu modellierenden Muster, aufgrund der nicht-symmetrischen, heterogenen Beziehung zwischen einem Anfragen und Dokumenten, wo erwiesen ist dass verschiedene Interaktionsmuster wie Termabhängigkeiten und Termnähe wichtig sind, aber nicht einfach durch ein einzelnes IR-Modell zu erfassen sind. In dieser Dissertation versuchen wir, beide Herausforderungen aus der Perspektive der IR-Evaluation bzw. der IR-Modellierung anzugehen, indem wir die manuellen Kosten mit automatischen Methoden reduzieren, indem wir die Verwendung von Crowdsourcing bei der Erfassung von Präferenzbewertungen untersuchen und indem wir neue neuronale IR-Modelle vorschlagen. Um die große Anzahl von Anfrage-Dokument-Paaren in der IR-Evaluation in Angriff zu nehmen, schlagen wir eine kostengünstige selektive Bewertungsmethode vor, die nur eine kleine Untermenge von repräsentativen Dokumenten für manuelle Beurteilungen auswählt, deren Ergebnisse dann extrapoliert werden; darüber hinaus wird ein unüberwachtes sprachmodellbasiertes Gütemaß für Neuheit und Diversität vorgeschlagen, wobei der Inhalt der bewerteten Dokumente genutzt wird, um unvollständige Bewertungen zu kompensieren. Außerdem versuchen wir Präferenzbewertungen praktisch nutzbar zu machen, indem wir empirisch verschiedene Eigenschaften der Bewertungen beim Sammeln über Crowdsourcing untersuchen, und indem wir einen neuartigen Bewertungsmechanismus entwickeln, der einen Kompromiss zwischen der Bewertungsqualität und der Anzahl der Bewertungen macht. Abschließend, um verschiedene komplizierte Muster in einem einzigen IR-Modell zu erfassen, inspiriert von den jüngsten Fortschritten bei Deep-Learning-Verfahren, entwickeln wir neuartige neuronale IR-Modelle, die verschiedene Muster wie Termabhängigkeit, Termnähe, Relevanzdichte sowie Anfrageabdeckung in einem einzelnen IR-Modell integrieren. Experimente auf verschiedenen Datensätzen zeigen die überlegene Performance des vorgeschlagenen IR-Modells

    Efficient and effective retrieval using Higher-Order proximity models

    Get PDF
    Information Retrieval systems are widely used to retrieve documents that are relevant to a user's information need. Systems leveraging proximity heuristics to estimate the relevance of a document have shown to be effective. However, the computational cost of proximity-based models is rarely considered, which is an important concern over large-scale document collections. The large-scale collections also make collection-based evaluation challenging since only a small number of documents are judged given the limited budget. Effectiveness, efficiency and reliable evaluation are coherent components that should be considered when developing a good retrieval system.This thesis makes several contributions from the three aspects. Many proximity-based retrieval models are effective, but it is also important to find efficient solutions to extract proximity features, especially for models using higher-order proximity statistics. We therefore propose a one-pass algorithm based on the PlaneSweep approach. We demonstrate that the new one-pass algorithm reduces the cost of capturing a full dependency relation of a query, regardless of the input representations. Although our proposed methods can capture higher-ordered proximity features efficiently, the trade-offs between effectiveness and efficiency when using proximity-based models remains largely unexplored. We consider different variants of proximity statistics and demonstrate that using local proximity statistics can achieve an improved trade-off between effectiveness and efficiency. Another important aspect in IR is reliable system comparisons. We conduct a series of experiments that explore the interaction between pooling and evaluation depth, interactions between evaluation metrics and evaluation depth and also correlations between two different evaluation metrics. We show that different evaluation configurations on large test collections, where only a limited number of relevance labels are available, can lead to different system comparison conclusions. We also demonstrate the pitfalls of choosing an arbitrary evaluation depth regardless of the metrics employed and the pooling depth of the test collections. Lastly, we provide suggestions on the evaluation configurations for the reliable comparisons of retrieval systems on large test collections. On these large test collections, a shallow judgment pool may be employed as assumed budgets are often limited, which may lead to an imprecise evaluation of system performance, especially when a deep evaluation metric is used. We propose an estimation framework for estimating deep metric score on shallow judgment pools. With an initial shallow judgment pool, rank-level estimators are designed to estimate the effectiveness gain at each ranking. Based on the rank-level estimations, we propose an optimization framework to obtain a more precise score estimate
    corecore