5 research outputs found

    Query expansion with terms selected using lexical cohesion analysis of documents

    Get PDF
    Cataloged from PDF version of article.We present new methods of query expansion using terms that form lexical cohesive links between the contexts of distinct query terms in documents (i.e., words surrounding the query terms in text). The link-forming terms (link-terms) and short snippets of text surrounding them are evaluated in both interactive and automatic query expansion (QE). We explore the effectiveness of snippets in providing context in interactive query expansion, compare query expansion from snippets vs. whole documents, and query expansion following snippet selection vs. full document relevance judgements. The evaluation, conducted on the HARD track data of TREC 2005, suggests that there are considerable advantages in using link-terms and their surrounding short text snippets in QE compared to terms selected from full-texts of documents. (C) 2006 Elsevier Ltd. All rights reserved

    Exploring sentence level query expansion in language modeling based information retrieval

    Get PDF
    We introduce two novel methods for query expansion in information retrieval (IR). The basis of these methods is to add the most similar sentences extracted from pseudo-relevant documents to the original query. The first method adds a fixed number of sentences to the original query, the second a progressively decreasing number of sentences. We evaluate these methods on the English and Bengali test collections from the FIRE workshops. The major findings of this study are that: i) performance is similar for both English and Bengali; ii) employing a smaller context (similar sentences) yields a considerably higher mean average precision (MAP) compared to extracting terms from full documents (up to 5.9% improvemnent in MAP for English and 10.7% for Bengali compared to standard Blind Relevance Feedback (BRF); iii) using a variable number of sentences for query expansion performs better and shows less variance in the best MAP for different parameter settings; iv) query expansion based on sentences can improve performance even for topics with low initial retrieval precision where standard BRF fails

    Wikipedia-Based Semantic Enhancements for Information Nugget Retrieval

    Get PDF
    When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document often will not be used in the most relevant nugget in the document for the query. In this thesis a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected either automatically or by human assessors for the query. Evaluated with the Nuggeteer automatic scoring software, which we show to have a high correlation with human assessor scores for the ciQA 2006 topics, an increase in the F-scores is found from the TREC Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. In addition, the method for finding synonyms using Wikipedia is evaluated using more common synonym detection tasks

    A Digital Library Approach to the Reconstruction of Ancient Sunken Ships

    Get PDF
    Throughout the ages, countless shipwrecks have left behind a rich historical and technological legacy. In this context, nautical archaeologists study the remains of these boats and ships and the cultures that created and used them. Ship reconstruction can be seen as an incomplete jigsaw reconstruction problem. Therefore, I hypothesize that a computational approach based on digital libraries can enhance the reconstruction of a composite object (ship) from fragmented, incomplete, and damaged pieces (timbers and ship remains). This dissertation describes a framework for enabling the integration of textual and visual information pertaining to wooden vessels from sources in multiple languages. Linking related pieces of information relies on query expansion and improving relevance. This is accomplished with the implementation of an algorithm that derives relationships from terms in a specialized glossary, combining them with properties and concepts expressed in an ontology. The main archaeological sources used in this dissertation are data generated from a 17th-century Portuguese ship, the Pepper Wreck, complemented with information obtained from other documented and studied shipwrecks. Shipbuilding treatises spanning from the late 16th- to the 19th-centuries provide textual sources along with various illustrations. Additional visual materials come from a repository of photographs and drawings documenting numerous underwater excavations and surveys. The ontology is based on a rich database of archaeological information compiled by Mr. Richard Steffy. The original database was analyzed and transformed into an ontological representation in RDF-OWL. Its creation followed an iterative methodology which included numerous revisions by nautical archaeologists. Although this ontology does not pretend to be a final version, it provides a robust conceptualization. The proposed approach is evaluated by measuring the usefulness of the glossary and the ontology. Evaluation results show improvements in query expansion across languages based on Blind Relevance Feedback using the glossary as query expansion collection. Similarly, contextualization was also improved by using the ontology for categorizing query results. These results suggest that related external sources can be exploited to better contextualize information in a particular domain. Given the characteristics of the materials in nautical archaeology, the framework proposed in this dissertation can be adapted and extended to other domains
    corecore