44 research outputs found
Sound ranking algorithms for XML search
Ranking algorithms for XML should reflect the actual combined content and structure constraints of queries, while at the same time producing equal rankings for queries that are semantically equal. Ranking algorithms that produce different rankings for queries that are semantically equal are easily detected by tests on large databases: We call such algorithms not sound. We report the behavior of different approaches to ranking content-and-structure queries on pairs of queries for which we expect equal ranking results from the query semantics. We show that most of these approaches are not sound. Of the remaining approaches, only 3 adhere to the W3C XQuery Full-Text standard
PFTijah: text search in an XML database system
This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery
Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness
In this paper we present a systematic analysis of document
retrieval using unstructured and structured queries within
the score region algebra (SRA) structured retrieval framework. The behavior of diÂźerent retrieval models, namely
Boolean, tf.idf, GPX, language models, and Okapi, is tested
using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation.
The analysis is performed on a numerous experiments
evaluated on TREC and CLEF collections, using manually
generated unstructured and structured queries. Unstructured queries range from the short title queries to long title
+ description + narrative queries. For generating structured
queries we exploit the knowledge of the document structure
and the content used to semantically describe or classify
documents. We show that such structured information can
be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries
A database approach to information retrieval:The remarkable relationship between language models and region models
In this report, we unify two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured document retrieval. They provide a well-defined behaviour as well as a simple query language that allows application developers to rapidly develop applications. Language models are particularly useful to reason about the ranking of search results, and for developing new ranking approaches. The unified model allows application developers to define complex language modeling approaches as logical queries on a textual database. We show a remarkable one-to-one relationship between region queries and the language models they represent for a wide variety of applications: simple ad-hoc search, cross-language retrieval, video retrieval, and web search
XML and Context: Structural Features Relevant to Search Tasks
We describe ongoing research into the relationship between search tasks and information retrieval strategies with respect to the use of structural information. We define a classification of search tasks in structured document collections and analyse the relevance of different structural features regarding each of these tasks for the INEX collection. The results presented show important differences in relevance of different features such as size and type of the components regarding informational and resource tasks
A survey on tree matching and XML retrieval
International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval