50 research outputs found

    A Prime Number Approach to Matching an XML Twig Pattern including Parent-Child Edges

    Get PDF
    Twig pattern matching is a core operation in XML query processing because it is how all the occurrences of a twig pattern in an XML document are found. In the past decade, many algorithms have been proposed to perform twig pattern matching. They rely on labelling schemes to determine relationships between elements corresponding to query nodes in constant time. In this paper, a new algorithm TwigStackPrime is proposed, which is an improvement to TwigStack (Bruno et al., 2002). To reduce the memory consumption and computation overhead of twig pattern matching algorithms when Parent-Child (P-C) edges are involved, TwigStackPrime efficiently filters out a tremendous number of irrelevant elements by introducing a new labelling scheme, called Child Prime Label (CPL). Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of CPL over the previous indexing and querying techniques. The experimental results show that the new technique has a superior performance to the previous approaches

    Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data

    Full text link
    As data are increasingly modeled as graphs for expressing complex relationships, the tree pattern query on graph-structured data becomes an important type of queries in real-world applications. Most practical query languages, such as XQuery and SPARQL, support logical expressions using logical-AND/OR/NOT operators to define structural constraints of tree patterns. In this paper, (1) we propose generalized tree pattern queries (GTPQs) over graph-structured data, which fully support propositional logic of structural constraints. (2) We make a thorough study of fundamental problems including satisfiability, containment and minimization, and analyze the computational complexity and the decision procedures of these problems. (3) We propose a compact graph representation of intermediate results and a pruning approach to reduce the size of intermediate results and the number of join operations -- two factors that often impair the efficiency of traditional algorithms for evaluating tree pattern queries. (4) We present an efficient algorithm for evaluating GTPQs using 3-hop as the underlying reachability index. (5) Experiments on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our algorithm, from several times to orders of magnitude faster than state-of-the-art algorithms in terms of evaluation time, even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page

    Solving the intractable problem: optimal performance for worst case scenarios in XML twig pattern matching

    Get PDF
    In the history of databases, eXtensible Markup Language (XML) has been thought of as the standard format to store and exchange semi-structured data. With the advent of IoT, XML technologies can play an important role in addressing the issue of processing a massive amount of data generated from heterogeneous devices. As the number and complexity of such datasets increases there is a need for algorithms which are able to index and retrieve XML data efficiently even for complex queries. In this context twig pattern matching , finding all occurrences of a twig pattern query (TPQ), is a core operation in XML query processing. Until now holistic joins have been considered the state-of-the-art TPQ processing algorithms, but they fail to guarantee an optimal evaluation except at the expense of excessive storage costs which limit their scope in large datasets. In this article, we introduce a new approach which significantly outperforms earlier methods in terms of both the size of the intermediate storage and query running time. The approach presented here uses Child Prime Labels (Alsubai & North, 2018) to improve the filtering phase of bottom-up twig matching algorithms and a novel algorithm which avoids the use of stacks, thus improving TPQs processing efficiency. Several experiments were conducted on common benchmarks such as DBLP, XMark and TreeBank datasets to study the performance of the new approach. Multiple analyses on a range of twig pattern queries are presented to demonstrate the statistical significance of the improvements

    Strategies and Approaches for Generating Identical Extensive XML Tree Instances

    Get PDF
    In recent years, XML has become the de facto internet wire language. Data may be organized and given context with the use of XML. A well-organized document facilitates the transformation of raw data into actionable intelligence. In B2B1 applications, the XML data is sent and created. This implies the need for fast query processing on XML data. The processing of XML tree sample queries (XTPQ) that provide an efficient response (also known as sample matching) is a topic of active study in the XML database field.DOM (Parser) may be used to transform an XML document into a tree representation. Extensible Markup Language (XML) query languages like XPath and XQuery use tree samples (twigs) to express query results.XML query processing focuses mostly on effectively locating all instances of twig 1 samples inside an XML database. Numerous techniques for matching such tree samples have been presented in recent years. In this study, we survey recent developments in XTPQ processing. This summary will begin by introducing several algorithms for twig sample matching and then go on to provide some background on holistic techniques to process XTPQ

    TwigStackPrime: A Novel Twig Join Algorithm Based on Prime Numbers

    Get PDF
    The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to utilize the specific characteristics of XML documents. A labelling scheme is fundamental to processing XML queries efficiently. They are used to determine structural relationships between elements corresponding to query nodes in twig pattern queries (TPQs). This article presents a design and implementation of a new indexing technique which exploits the property of prime numbers to identify Parent-Child (P-C) relationships in TPQs during query evaluation. The Child Prime Label (CPL, for short) approach can be efficiently incorporated within the existing labelling schemes. Here, we propose a novel twig matching algorithm based on the well known TwigStack algorithm [3], which applies the CPL approach and focuses on reducing the overhead of storing useless elements and performing unnecessary join operations. Our performance evaluation demonstrates that the new algorithm significantly outperforms the previous approaches

    Child Prime Label Approaches to Evaluate XML Structured Queries

    Get PDF
    The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets

    Efficient processing of XML twig pattern matching

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Efficient processing of multiple XML twig queries

    Get PDF
    Master'sMASTER OF SCIENC
    corecore