Search CORE

18 research outputs found

A Join Index for XML Data Warehouses

Author: Aouiche Kamel
Darmont Jérôme
Mahboubi Hadj
Publication venue
Publication date: 01/01/2008
Field of study

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.Comment: 2008 International Conference on Information Resources Management (Conf-IRM 08), Niagra Falls : Canada (2008

arXiv.org e-Print Archive

HAL Descartes

AIS Electronic Library (AISeL)

Fast and Tiny Structural Self-Indexes for XML

Author: Maneth Sebastian
Sebastian Tom
Publication venue
Publication date: 27/12/2010
Field of study

XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Efficient creation and incremental maintenance of the hopi index for complex xml document collections

Author: Anja Theobald
Gerhard Weikum
Ralf Schenkel
Publication venue
Publication date: 01/01/2005
Field of study

The HOPI index, a connection index for XML documents based on the concept of a 2–hop cover, provides space – and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates

CiteSeerX

MPG.PuRe

Efficient processing of multiple XML twig queries

Author: LIU HUANZHANG
Publication venue
Publication date: 24/05/2007
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

DescribeX: A Framework for Exploring and Querying XML Web Collections

Author: Rizzolo Flavio
Publication venue
Publication date: 01/01/2008
Field of study

This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogeneous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX's light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

관계형 RDF 저장소에서 그래프 구조적 정보를 사용한 질의 최적화 기법

Author: 김기성
Publication venue: 서울대학교 대학원
Publication date: 01/02/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 김형주.As the size of Resource Description Framework (RDF) graphs has grown rapidly, SPARQL query processing on the large-scale RDF graph has become a more challenging problem. For efficient SPARQL query processing, the handling of the intermediate results is the most crucial element because it generally involves many join operators. In order to address this problem, we ropose the triple filtering method that exploits the graph-structural information of RDF data. We design the RDF Path index (RP-index) and the RDF Graph index (RGindex) for the triple filtering. These two indices uses the path information and the graph information of the RDF graph, respectively. However, these indices have the size problem due to the exponential number of the indexed patterns. We address the size problem by indexing only effective the path and graph patterns for the triple filtering. The triple filtering is performed very efficiently by a relational operator called the RDF Filter (RFLT) with little overhead compared to the original query processing. Through comprehensive experiments on large-scale RDF datasets, we demonstrate that our approaches can effectively and efficiently reduce the number of redundant intermediate results and improve the query performance.A Query Optimization Technique using Graph-Structural Information in Relational RDF Stores Chapter 1 Introduction 1 1.1 Research Motivation 3 1.2 Our Contributions 6 1.3 Outline 11 Chapter 2 Related Work 13 2.1 RDF Stores 13 2.1.1 Summary of Existing Methods of Relation-based RDF Stores 16 2.1.2 Overview of RDF-3X 18 2.2 Handling the Intermediate Results 20 2.3 Path-based and Graph Indices 21 2.4 Frequent Graph Pattern Mining 23 Chapter 3 Preliminaries 25 3.1 RDF and SPARQL 25 3.2 Path and Graph Pattern 29 3.2.1 Incoming Predicate Path 29 3.2.2 k-neighborhood Subgraph 30 3.3 Candidate Vertex Set 31 Chapter 4 R3F: RDF Triple Filtering Framework using RP-index 35 4.1 Motivating Example 35 4.2 Overall Process of R3F 37 4.3 RP-index Definition 38 4.3.1 Physical Structure of RP-index 39 4.3.2 Discriminative and Frequent Predicate Paths 40 4.3.3 Reverse Predicate 42 4.3.4 Handling Other Types of Queries 45 4.3.5 Determining RP-index Parameters 46 4.4 Processing Triple Filtering 47 4.4.1 RFLT Operator 47 4.5 Generating an Execution Plan with RFLT Operators 52 4.5.1 Filtering Effect of Vlists 54 4.5.2 Cardinality of RFLT Operator 55 4.5.3 Generating an Execution Plan 57 4.6 RP-index Building 59 4.6.1 Complexity of building RP-index 63 4.6.2 Parallel Building Methods 63 4.6.3 Incremental Maintenance 65 4.7 Experimental Results 68 4.7.1 RP-index Size 70 4.7.2 Query Evaluation Performance 73 4.7.3 Incremental Maintenance of RP-index 78 Chapter 5 RG-index: RDF Triple Filtering using the Graph Index 87 5.1 Motivating Example 87 5.2 Design of RG-index 90 5.2.1 Physical Structure of RG-index 92 5.3 Handling the Size Problem of RG-index 96 5.3.1 Discriminative Patterns 96 5.3.2 Frequent Patterns 97 5.4 Building RG-index 98 5.4.1 Overview of gSpan 98 5.4.2 RDF Graph Pattern Mining using gSpan 99 5.4.3 Complexity of building RG-index 106 5.5 Triple Filtering using RG-index 106 5.5.1 Generating an Execution Plan with RFLT Operators 107 5.6 Experimental Results 109 5.6.1 RG-index Size 111 5.6.2 Query Evaluation Performance 112 5.6.3 Index Building Time 116 Chapter 6 Conclusion and Future Work 119 6.1 Future Work 120 Appendices 125 Chapter A Related Open Source Projects 125 A.1 RDF-3X 125 A.2 gSpan 129 Chapter B Data Structure of RP-index and RG-index 133 B.1 RP-index 133 B.2 RG-index 134 Chapter C Query Sets 137Docto

SNU Open Repository and Archive