2,408 research outputs found
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
A Database Approach to Content-based XML retrieval
This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments
Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database
This paper investigates the impact of three approaches to XML retrieval:
using Zettair, a full-text information retrieval system; using eXist, a native
XML database; and using a hybrid system that takes full article answers from
Zettair and uses eXist to extract elements from those articles. For the
content-only topics, we undertake a preliminary analysis of the INEX 2003
relevance assessments in order to identify the types of highly relevant
document components. Further analysis identifies two complementary sub-cases of
relevance assessments ("General" and "Specific") and two categories of topics
("Broad" and "Narrow"). We develop a novel retrieval module that for a
content-only topic utilises the information from the resulting answer list of a
native XML database and dynamically determines the preferable units of
retrieval, which we call "Coherent Retrieval Elements". The results of our
experiments show that -- when each of the three systems is evaluated against
different retrieval scenarios (such as different cases of relevance
assessments, different topic categories and different choices of evaluation
metrics) -- the XML retrieval systems exhibit varying behaviour and the best
performance can be reached for different values of the retrieval parameters. In
the case of INEX 2003 relevance assessments for the content-only topics, our
newly developed hybrid XML retrieval system is substantially more effective
than either Zettair or eXist, and yields a robust and a very effective XML
retrieval.Comment: Postprint version. The editor version can be accessed through the DO
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
Enhancing Content-And-Structure Information Retrieval using a Native XML Database
Three approaches to content-and-structure XML retrieval are analysed in this
paper: first by using Zettair, a full-text information retrieval system; second
by using eXist, a native XML database, and third by using a hybrid XML
retrieval system that uses eXist to produce the final answers from likely
relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics
can be classified in two categories: the first retrieving full articles as
final answers, and the second retrieving more specific elements within articles
as final answers. We show that for both topic categories our initial hybrid
system improves the retrieval effectiveness of a native XML database. For
ranking the final answer elements, we propose and evaluate a novel retrieval
model that utilises the structural relationships between the answer elements of
a native XML database and retrieves Coherent Retrieval Elements. The final
results of our experiments show that when the XML retrieval task focusses on
highly relevant elements our hybrid XML retrieval system with the Coherent
Retrieval Elements module is 1.8 times more effective than Zettair and 3 times
more effective than eXist, and yields an effective content-and-structure XML
retrieval
A multi-layered Bayesian network model for structured document retrieval
New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented
- …