33 research outputs found
XPath: Looking Forward
The location path language XPath is of particular importance for XML applications since it is a core component of many XML processing standards such as XSLT or XQuery. In this paper, based on axis symmetry of XPath, equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established. These equivalences are used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones. Location paths without reverse axes, as generated by the presented rewriting algorithm, enable efficient SAX-like streamed data processing of XPath
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Visual exploration and retrieval of XML document collections with the generic system X2
This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user
first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically.
After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
Indexed Tree Matching with Complete Answer Representations
. This paper picks up the Tree Matching approach to integrate the paradigm of structured documents into the field of Information Retrieval. The concept of Tree Matching is extended by the notion of complete answer representations (CARs), which makes it possible to avoid the combinatorial explosion in the number of solutions (and thus complexity). An algorithm is presented that combines a class of Tree Matching problems with index-based search and returns a CAR in linear time. 1 Introduction During the last years, considerable effort has been put into the integration of structured documents into the field of Information Retrieval. Various systems have been proposed, providing facilities to query a set of documents on structure and content level (see [NBY96] and [Loe94] for surveys). These queries provide the user with a powerful instrument: She may query for instance for all letters containing a subject line somehow related to the term "insurance". She can select out of a document pool..
DAG Matching Techniques for Information Retrieval on Structured Documents
With the establishment of international standards for document representation like SGML, ODA, or XML, attention in Information Retrieval has shifted to representation models and query languages that make active use both of the logical structure and the contents of the documents in a document database. At the same time, representation of structure has become more and more important in other types of databases as well. Among several related approaches, Kilpelainen's Tree Matching is one of the most expressive and intuitive formalisms for querying databases with treestructured entities. However, in its original formulation it leaves aside most of the problems that arise in real-life applications of Information Retrieval. In this paper we extend Tree Matching to DAG Matching and suggest various techniques that should be useful when using the formalism in a practical IR system. In particular we suggest a representation of answers that can cope with the potentially huge number of entities in..
Implementing Constraint Solvers: Theory and Practice
Our research is based on Constraint Handling Rules (CHR), a powerful language for writing constraint solvers. We investigate confluence of CHR programs. This property guarantees that a CHR program will always compute the same result for a given set of constraints independent of which rules are applied. We give a decidable, sufficient and necessary syntactic condition for confluence. Finally we present an application utilizing CHR offering rent advice: The city government of Munich regularly publishes a booklet called the "Mietspiegel" (MS). The MS basically contains a verbal description of an expert system. It allows to calculate the estimated fair rent of a flat. With our computerized version, "The Munich Rent Advisor", we extended the functionality and applicability of the MS so that the user need not answer all questions of the form. The key to computing with partial information was to use constraint technology