5 research outputs found
Query expansion with terms selected using lexical cohesion analysis of documents
Cataloged from PDF version of article.We present new methods of query expansion using terms that form lexical cohesive links between the contexts of distinct query terms in documents (i.e., words surrounding the query terms in text). The link-forming terms (link-terms) and short snippets of text surrounding them are evaluated in both interactive and automatic query expansion (QE). We explore the effectiveness of snippets in providing context in interactive query expansion, compare query expansion from snippets vs. whole documents, and query expansion following snippet selection vs. full document relevance judgements. The evaluation, conducted on the HARD track data of TREC 2005, suggests that there are considerable advantages in using link-terms and their surrounding short text snippets in QE compared to terms selected from full-texts of documents. (C) 2006 Elsevier Ltd. All rights reserved
Exploring sentence level query expansion in language modeling based information retrieval
We introduce two novel methods for query expansion in information retrieval (IR). The basis of these methods is to add the most similar sentences extracted from
pseudo-relevant documents to the original query. The first method adds a fixed number of sentences to the original query, the second a progressively decreasing number of sentences. We evaluate these methods on the English and Bengali test collections from the FIRE workshops. The major
findings of this study are that: i) performance is similar for both English and Bengali; ii) employing a smaller context (similar sentences) yields a considerably higher
mean average precision (MAP) compared to extracting terms from full documents (up to 5.9% improvemnent in MAP for
English and 10.7% for Bengali compared to standard Blind Relevance Feedback (BRF); iii) using a variable number of sentences for query expansion performs better and shows less variance in the best MAP for different parameter settings; iv) query expansion based on sentences can
improve performance even for topics with low initial retrieval precision where standard BRF fails
Wikipedia-Based Semantic Enhancements for Information Nugget Retrieval
When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document often will not be used in the most relevant nugget in the document for the query. In this thesis a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected either automatically or by human assessors for the query. Evaluated with the Nuggeteer automatic scoring software, which we show to have a high correlation with
human assessor scores for the ciQA 2006 topics, an increase in the F-scores is found from the TREC Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. In addition, the method for finding synonyms using Wikipedia is evaluated using more common synonym detection tasks
A Digital Library Approach to the Reconstruction of Ancient Sunken Ships
Throughout the ages, countless shipwrecks have left behind a rich historical and
technological legacy. In this context, nautical archaeologists study the remains of these
boats and ships and the cultures that created and used them. Ship reconstruction can be
seen as an incomplete jigsaw reconstruction problem. Therefore, I hypothesize that a
computational approach based on digital libraries can enhance the reconstruction of a
composite object (ship) from fragmented, incomplete, and damaged pieces (timbers and
ship remains).
This dissertation describes a framework for enabling the integration of textual
and visual information pertaining to wooden vessels from sources in multiple languages.
Linking related pieces of information relies on query expansion and improving
relevance. This is accomplished with the implementation of an algorithm that derives
relationships from terms in a specialized glossary, combining them with properties and
concepts expressed in an ontology.
The main archaeological sources used in this dissertation are data generated from
a 17th-century Portuguese ship, the Pepper Wreck, complemented with information
obtained from other documented and studied shipwrecks. Shipbuilding treatises
spanning from the late 16th- to the 19th-centuries provide textual sources along with
various illustrations. Additional visual materials come from a repository of photographs
and drawings documenting numerous underwater excavations and surveys.
The ontology is based on a rich database of archaeological information compiled
by Mr. Richard Steffy. The original database was analyzed and transformed into an
ontological representation in RDF-OWL. Its creation followed an iterative methodology
which included numerous revisions by nautical archaeologists. Although this ontology
does not pretend to be a final version, it provides a robust conceptualization.
The proposed approach is evaluated by measuring the usefulness of the glossary
and the ontology. Evaluation results show improvements in query expansion across
languages based on Blind Relevance Feedback using the glossary as query expansion
collection. Similarly, contextualization was also improved by using the ontology for
categorizing query results. These results suggest that related external sources can be
exploited to better contextualize information in a particular domain. Given the
characteristics of the materials in nautical archaeology, the framework proposed in this
dissertation can be adapted and extended to other domains