146 research outputs found

    Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data

    Full text link
    As data are increasingly modeled as graphs for expressing complex relationships, the tree pattern query on graph-structured data becomes an important type of queries in real-world applications. Most practical query languages, such as XQuery and SPARQL, support logical expressions using logical-AND/OR/NOT operators to define structural constraints of tree patterns. In this paper, (1) we propose generalized tree pattern queries (GTPQs) over graph-structured data, which fully support propositional logic of structural constraints. (2) We make a thorough study of fundamental problems including satisfiability, containment and minimization, and analyze the computational complexity and the decision procedures of these problems. (3) We propose a compact graph representation of intermediate results and a pruning approach to reduce the size of intermediate results and the number of join operations -- two factors that often impair the efficiency of traditional algorithms for evaluating tree pattern queries. (4) We present an efficient algorithm for evaluating GTPQs using 3-hop as the underlying reachability index. (5) Experiments on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our algorithm, from several times to orders of magnitude faster than state-of-the-art algorithms in terms of evaluation time, even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page

    Fast and Tiny Structural Self-Indexes for XML

    Full text link
    XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    A survey on tree matching and XML retrieval

    Get PDF
    International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

    Accelerating data retrieval steps in XML documents

    Get PDF

    A node partitioning strategy for optimising the performance of XML queries

    Get PDF
    For ease of communication between heterogeneous systems, the eXtensible Markup Language (XML) has been widely adopted as a data storage format. However, XML query processing presents issues both in terms of query performance and updatability. Thus, many are choosing to shred XML data into relational databases in order to benet from its mature technology. The problem with this approach is that (often complex and time consuming) data transformation processes are required to transform XML data to relational tables and vice versa. Additionally, many of the benets of XML data can be lost during these processes. In this dissertation, we present a process that partitions nodes within an XML document into disjoint subsets. Briefly, as there are fewer partitions than there are nodes, a more efficient join operation can be performed between partitions, thus reducing the number of inefficient node comparisons. The number and size of partitions varies depending on the structure and layout in the XML document, and the number of partitions impacts query performance. Therefore, we also provide a partition classication process, which signicantly reduces the number of partitions because each partition class represents many equivalent partitions within the XML document. In this dissertation, we will demonstrate that our approach outperforms similar approaches for a large subset of XML queries by eliminating complex join operations (where possible) during the query process

    A High Performance XML Querying Architecture

    Get PDF
    Data exchange on the Internet plays an essential role in electronic business (e-business). A recent trend in e-business is to create distributed databases to facilitate data exchange. In most cases, the distributed databases are developed by integrating existing systems, which may be in different database models, and on different hardware and/or software platforms. Heterogeneity may cause many difficulties. A solution to the difficulties is XML (the Extensible Markup Language). XML is becoming the dominant language for exchanging data on the Internet. To develop XML systems for practical applications, developers have to addresses the performance issues. In this paper, we describe a new XML querying architecture that can be used to build high performance systems. Experiments indicate that the architecture performs better than Oracle XML DB, which is one of the most commonly used commercial DBMSs for XML

    Clustering-based Labelling Scheme - A Hybrid Approach for Efficient Querying and Updating XML Documents

    Get PDF
    Extensible Markup Language (XML) has become a dominant technology for transferring data through the worldwide web. The XML labelling schemes play a key role in handling XML data efficiently and robustly. Thus, many labelling schemes have been proposed. However, these labelling schemes have limitations and shortcomings. Thus, the aim of this research was to investigate the existing XML labelling schemes and their limitations in order to address the issue of efficiency of XML query performance. This thesis investigated the existing labelling schemes and classified them into three categories based on certain criteria, in order to identify the limitations and challenges of these labelling schemes. Based on the outcomes of this investigation, this thesis proposed a state-of-theart labelling scheme, called clustering-based labelling scheme, to resolve or improve the key limitations such as the efficiency of the XML query processing, labelling XML nodes, and XML updates cost. This thesis argued that using certain existing labelling schemes to label nodes, and using the clustering-based techniques can improve query and labelling nodes efficiency. Theoretically, the proposed scheme is based on dividing the nodes of an XML document into clusters. Two existing labelling schemes, which are the Dewey and LLS labelling schemes, were selected for labelling these clusters and their nodes. Subsequently, the proposed scheme was designed and implemented. In addition, the Dewey and LLS labelling scheme were implemented for the purpose of evaluating the proposed scheme. Subsequently, four experiments were designed in order to test the proposed scheme against the Dewey and LLS labelling schemes. The results of these experiments suggest that the proposed scheme achieved better results than the Dewey and LLS schemes. Consequently, the research hypothesis was accepted overall with few exceptions, and the proposed scheme showed an improvement in the performance and all the targeted features and aspects
    • 

    corecore