5 research outputs found

    The SPIRIT collection: an overview of a large web collection

    Get PDF
    A large scale collection of web pages has been essential for research in information retrieval and related areas. This paper provides an overview of a large web collection used in the SPIRIT project for the design and testing of spatially-aware retrieval systems. Several statistics are derived and presented to show the characteristics of the collection

    Wikipedia-Based Semantic Enhancements for Information Nugget Retrieval

    Get PDF
    When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document often will not be used in the most relevant nugget in the document for the query. In this thesis a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected either automatically or by human assessors for the query. Evaluated with the Nuggeteer automatic scoring software, which we show to have a high correlation with human assessor scores for the ciQA 2006 topics, an increase in the F-scores is found from the TREC Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. In addition, the method for finding synonyms using Wikipedia is evaluated using more common synonym detection tasks

    The Impact of Corpus Size on Question Answering Performance

    No full text
    Using our question answering system, questions from the TREC 2001 evaluation were executed over a series of Web data collections, with the sizes of the collections increasing from 25 gigabytes up to nearly a terabyte
    corecore