50 research outputs found
A Prime Number Approach to Matching an XML Twig Pattern including Parent-Child Edges
Twig pattern matching is a core operation in XML query processing because it is how all the occurrences
of a twig pattern in an XML document are found. In the past decade, many algorithms have been proposed
to perform twig pattern matching. They rely on labelling schemes to determine relationships between
elements corresponding to query nodes in constant time. In this paper, a new algorithm TwigStackPrime is
proposed, which is an improvement to TwigStack (Bruno et al., 2002). To reduce the memory consumption and
computation overhead of twig pattern matching algorithms when Parent-Child (P-C) edges are involved, TwigStackPrime
efficiently filters out a tremendous number of irrelevant elements by introducing a new labelling
scheme, called Child Prime Label (CPL). Extensive performance studies on various real-world and artificial
datasets were conducted to demonstrate the significant improvement of CPL over the previous indexing and
querying techniques. The experimental results show that the new technique has a superior performance to the
previous approaches
Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data
As data are increasingly modeled as graphs for expressing complex
relationships, the tree pattern query on graph-structured data becomes an
important type of queries in real-world applications. Most practical query
languages, such as XQuery and SPARQL, support logical expressions using
logical-AND/OR/NOT operators to define structural constraints of tree patterns.
In this paper, (1) we propose generalized tree pattern queries (GTPQs) over
graph-structured data, which fully support propositional logic of structural
constraints. (2) We make a thorough study of fundamental problems including
satisfiability, containment and minimization, and analyze the computational
complexity and the decision procedures of these problems. (3) We propose a
compact graph representation of intermediate results and a pruning approach to
reduce the size of intermediate results and the number of join operations --
two factors that often impair the efficiency of traditional algorithms for
evaluating tree pattern queries. (4) We present an efficient algorithm for
evaluating GTPQs using 3-hop as the underlying reachability index. (5)
Experiments on both real-life and synthetic data sets demonstrate the
effectiveness and efficiency of our algorithm, from several times to orders of
magnitude faster than state-of-the-art algorithms in terms of evaluation time,
even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page
Solving the intractable problem: optimal performance for worst case scenarios in XML twig pattern matching
In the history of databases, eXtensible Markup Language (XML) has been thought of as the standard format to store and exchange semi-structured data. With the advent of IoT, XML technologies can play an important role in addressing the issue of processing a massive amount of data generated from heterogeneous devices. As the number and complexity of such datasets increases there is a need for algorithms which are able to index and retrieve XML data efficiently even for complex queries. In this context twig pattern matching , finding all occurrences of a twig pattern query (TPQ), is a core operation in XML query processing. Until now holistic joins have been considered the state-of-the-art TPQ processing algorithms, but they fail to guarantee an optimal evaluation except at the expense of excessive storage costs which limit their scope in large datasets. In this article, we introduce a new approach which significantly outperforms earlier methods in terms of both the size of the intermediate storage and query running time. The approach presented here uses Child Prime Labels (Alsubai & North, 2018) to improve the filtering phase of bottom-up twig matching algorithms and a novel algorithm which avoids the use of stacks, thus improving TPQs processing efficiency. Several experiments were conducted on common benchmarks such as DBLP, XMark and TreeBank datasets to study the performance of the new approach. Multiple analyses on a range of twig pattern queries are presented to demonstrate the statistical significance of the improvements
Strategies and Approaches for Generating Identical Extensive XML Tree Instances
In recent years, XML has become the de facto internet wire language. Data may be organized and given context with the use of XML. A well-organized document facilitates the transformation of raw data into actionable intelligence. In B2B1 applications, the XML data is sent and created. This implies the need for fast query processing on XML data. The processing of XML tree sample queries (XTPQ) that provide an efficient response (also known as sample matching) is a topic of active study in the XML database field.DOM (Parser) may be used to transform an XML document into a tree representation. Extensible Markup Language (XML) query languages like XPath and XQuery use tree samples (twigs) to express query results.XML query processing focuses mostly on effectively locating all instances of twig 1 samples inside an XML database. Numerous techniques for matching such tree samples have been presented in recent years. In this study, we survey recent developments in XTPQ processing. This summary will begin by introducing several algorithms for twig sample matching and then go on to provide some background on holistic techniques to process XTPQ
TwigStackPrime: A Novel Twig Join Algorithm Based on Prime Numbers
The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to utilize the specific characteristics of XML documents. A labelling scheme is fundamental to processing XML queries efficiently. They are used to determine structural relationships between elements corresponding to query nodes in twig pattern queries (TPQs). This article presents a design and implementation of a new indexing technique which exploits the property of prime numbers to identify Parent-Child (P-C) relationships in TPQs during query evaluation. The Child Prime Label (CPL, for short) approach can be efficiently incorporated within the existing labelling schemes. Here, we propose a novel twig matching algorithm based on the well known TwigStack algorithm [3], which applies the CPL approach and focuses on reducing the overhead of storing useless elements and performing unnecessary join operations. Our performance evaluation demonstrates that the new algorithm significantly outperforms the previous approaches
Child Prime Label Approaches to Evaluate XML Structured Queries
The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets