33 research outputs found

    XPath: Looking Forward

    Get PDF
    The location path language XPath is of particular importance for XML applications since it is a core component of many XML processing standards such as XSLT or XQuery. In this paper, based on axis symmetry of XPath, equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established. These equivalences are used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones. Location paths without reverse axes, as generated by the presented rewriting algorithm, enable efficient SAX-like streamed data processing of XPath

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    Visual exploration and retrieval of XML document collections with the generic system X2

    Get PDF
    This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed

    Indexed Tree Matching with Complete Answer Representations

    No full text
    . This paper picks up the Tree Matching approach to integrate the paradigm of structured documents into the field of Information Retrieval. The concept of Tree Matching is extended by the notion of complete answer representations (CARs), which makes it possible to avoid the combinatorial explosion in the number of solutions (and thus complexity). An algorithm is presented that combines a class of Tree Matching problems with index-based search and returns a CAR in linear time. 1 Introduction During the last years, considerable effort has been put into the integration of structured documents into the field of Information Retrieval. Various systems have been proposed, providing facilities to query a set of documents on structure and content level (see [NBY96] and [Loe94] for surveys). These queries provide the user with a powerful instrument: She may query for instance for all letters containing a subject line somehow related to the term "insurance". She can select out of a document pool..

    DAG Matching Techniques for Information Retrieval on Structured Documents

    No full text
    With the establishment of international standards for document representation like SGML, ODA, or XML, attention in Information Retrieval has shifted to representation models and query languages that make active use both of the logical structure and the contents of the documents in a document database. At the same time, representation of structure has become more and more important in other types of databases as well. Among several related approaches, Kilpelainen's Tree Matching is one of the most expressive and intuitive formalisms for querying databases with treestructured entities. However, in its original formulation it leaves aside most of the problems that arise in real-life applications of Information Retrieval. In this paper we extend Tree Matching to DAG Matching and suggest various techniques that should be useful when using the formalism in a practical IR system. In particular we suggest a representation of answers that can cope with the potentially huge number of entities in..

    Implementing Constraint Solvers: Theory and Practice

    No full text
    Our research is based on Constraint Handling Rules (CHR), a powerful language for writing constraint solvers. We investigate confluence of CHR programs. This property guarantees that a CHR program will always compute the same result for a given set of constraints independent of which rules are applied. We give a decidable, sufficient and necessary syntactic condition for confluence. Finally we present an application utilizing CHR offering rent advice: The city government of Munich regularly publishes a booklet called the "Mietspiegel" (MS). The MS basically contains a verbal description of an expert system. It allows to calculate the estimated fair rent of a flat. With our computerized version, "The Munich Rent Advisor", we extended the functionality and applicability of the MS so that the user need not answer all questions of the form. The key to computing with partial information was to use constraint technology
    corecore