84,132 research outputs found

    Parsimonious Language Models for a Terabyte of Text

    Get PDF
    The aims of this paper are twofold. Our first aim\ud is to compare results of the earlier Terabyte tracks\ud to the Million Query track. We submitted a number\ud of runs using different document representations\ud (such as full-text, title-fields, or incoming\ud anchor-texts) to increase pool diversity. The initial\ud results show broad agreement in system rankings\ud over various measures on topic sets judged at both\ud Terabyte and Million Query tracks, with runs using\ud the full-text index giving superior results on\ud all measures, but also some noteworthy upsets.\ud Our second aim is to explore the use of parsimonious\ud language models for retrieval on terabyte-scale\ud collections. These models are smaller thus\ud more efficient than the standard language models\ud when used at indexing time, and they may also improve\ud retrieval performance. We have conducted\ud initial experiments using parsimonious models in\ud combination with pseudo-relevance feedback, for\ud both the Terabyte and Million Query track topic\ud sets, and obtained promising initial results

    Rhetorical relations for information retrieval

    Full text link
    Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are linked to each other. Knowledge about this socalled discourse structure has been applied successfully to several natural language processing tasks. This work studies the use of rhetorical relations for Information Retrieval (IR): Is there a correlation between certain rhetorical relations and retrieval performance? Can knowledge about a document's rhetorical relations be useful to IR? We present a language model modification that considers rhetorical relations when estimating the relevance of a document to a query. Empirical evaluation of different versions of our model on TREC settings shows that certain rhetorical relations can benefit retrieval effectiveness notably (> 10% in mean average precision over a state-of-the-art baseline)

    Unsupervised, Efficient and Semantic Expertise Retrieval

    Get PDF
    We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.Comment: WWW2016, Proceedings of the 25th International Conference on World Wide Web. 201

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non
    • …
    corecore