Search CORE

6 research outputs found

Solving the intractable problem: optimal performance for worst case scenarios in XML twig pattern matching

Author: Alsubai S.
North S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/11/2020
Field of study

In the history of databases, eXtensible Markup Language (XML) has been thought of as the standard format to store and exchange semi-structured data. With the advent of IoT, XML technologies can play an important role in addressing the issue of processing a massive amount of data generated from heterogeneous devices. As the number and complexity of such datasets increases there is a need for algorithms which are able to index and retrieve XML data efficiently even for complex queries. In this context twig pattern matching , finding all occurrences of a twig pattern query (TPQ), is a core operation in XML query processing. Until now holistic joins have been considered the state-of-the-art TPQ processing algorithms, but they fail to guarantee an optimal evaluation except at the expense of excessive storage costs which limit their scope in large datasets. In this article, we introduce a new approach which significantly outperforms earlier methods in terms of both the size of the intermediate storage and query running time. The approach presented here uses Child Prime Labels (Alsubai & North, 2018) to improve the filtering phase of bottom-up twig matching algorithms and a novel algorithm which avoids the use of stacks, thus improving TPQs processing efficiency. Several experiments were conducted on common benchmarks such as DBLP, XMark and TreeBank datasets to study the performance of the new approach. Multiple analyses on a range of twig pattern queries are presented to demonstrate the statistical significance of the improvements

White Rose Research Online

TwigStackPrime: A Novel Twig Join Algorithm Based on Prime Numbers

Author: Alsubai S.
North S.D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to utilize the specific characteristics of XML documents. A labelling scheme is fundamental to processing XML queries efficiently. They are used to determine structural relationships between elements corresponding to query nodes in twig pattern queries (TPQs). This article presents a design and implementation of a new indexing technique which exploits the property of prime numbers to identify Parent-Child (P-C) relationships in TPQs during query evaluation. The Child Prime Label (CPL, for short) approach can be efficiently incorporated within the existing labelling schemes. Here, we propose a novel twig matching algorithm based on the well known TwigStack algorithm [3], which applies the CPL approach and focuses on reducing the overhead of storing useless elements and performing unnecessary join operations. Our performance evaluation demonstrates that the new algorithm significantly outperforms the previous approaches

Crossref

White Rose Research Online

A survey on tree matching and XML retrieval

Author: Aho
Al-Khalifa
Alilaouar
Amer-Yahia
Aouicha
Ayala
Bille
Bille
Botev
Bruno
Buneman
Burghardt
Cai
Campi
Ceri
Chamberlin
Chase
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Cole
Cole
Cyril Laitang
Dalamagas
Dalamagas
Damiani
Damiani
Dao
de Vries
Demaine
Denoyer
Dubiner
Dulucq
Dürr
Hamamache Kheddouci
Haw
Haw
Hoffmann
Hubert
Hummel
Izadi
Jansson
Jiang
Jiang
Jiang
Kamps
Karen Pinel-Sauvagnat
Kazai
Kazai
Kilpelainen
Klein
Knuth
Kosaraju
Kuboyama
Laitang
Lalmas
Lalmas
Le
Lei Ning
Levenshtein
Levy
Li
Li
Li
Lu
Lu
Mass
Mihajlovic
Mohammed Amin Tahraoui
Mohand Boughanem
Ogilvie
Pehcevski
Pehcevski
Pinel-Sauvagnat
Piwowarski
Popovici
Qin
Rao
Richter
Robie
Runapongsa
Schenkel
Schenkel
Schlieder
Shasha
Stahl
Tai
Tekli
Theobald
Trotman
Trotman
Trotman
Trotman
Trotman
van Zwol
Wagner
Wang
Wang
Wang
Wang
Wu
Yang
Yao
Zezula
Zezula
Zhang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

Open Archive Toulouse Archive Ouverte

HAL

Hal-Diderot

Twig Pattern Search in XML Database

Author: Zou Leping
Publication venue: 'University of Winnipeg'
Publication date: 03/03/2011
Field of study

For current search engine, we got results ranked by popularity. However, the most popular topics are not always I want. Millions people have millions different favors. So, the main challenge is how to dig the information up from the tremendous database of Internet according to different people's favor. In computer science, "favor" is pattern. We call it "Twig Pattern Search". Unlike index methods that split a query into several sub-queries, and then stick the results together to provide the final answers, twig pattern search uses tree structures as the master unit of query to avoid expensive join operations. We present an efficient algorithm for tree mapping problem in XML database. Given a target tree T and a pattern tree Q, the algorithm can find all the embeddings of Q in T in O (|D||Q|) time, where D is the largest data stream associated with a node of Q.Master of Science in Applied Computer Scienc

WinnSpace Repository

Using semantics in XML query processing

Author: WU HUAYU
Publication venue
Publication date: 21/03/2011
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Child Prime Label Approaches to Evaluate XML Structured Queries

Author: Alsubai Shtwai
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 05/03/2018
Field of study

The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets

White Rose E-theses Online