18 research outputs found
A Join Index for XML Data Warehouses
XML data warehouses form an interesting basis for decision-support
applications that exploit complex data. However, native-XML database management
systems (DBMSs) currently bear limited performances and it is necessary to
research for ways to optimize them. In this paper, we propose a new join index
that is specifically adapted to the multidimensional architecture of XML
warehouses. It eliminates join operations while preserving the information
contained in the original warehouse. A theoretical study and experimental
results demonstrate the efficiency of our join index. They also show that
native XML DBMSs can compete with XML-compatible, relational DBMSs when
warehousing and analyzing XML data.Comment: 2008 International Conference on Information Resources Management
(Conf-IRM 08), Niagra Falls : Canada (2008
Fast and Tiny Structural Self-Indexes for XML
XML document markup is highly repetitive and therefore well compressible
using dictionary-based methods such as DAGs or grammars. In the context of
selectivity estimation, grammar-compressed trees were used before as synopsis
for structural XPath queries. Here a fully-fledged index over such grammars is
presented. The index allows to execute arbitrary tree algorithms with a
slow-down that is comparable to the space improvement. More interestingly,
certain algorithms execute much faster over the index (because no decompression
occurs). E.g., for structural XPath count queries, evaluating over the index is
faster than previous XPath implementations, often by two orders of magnitude.
The index also allows to serialize XML results (including texts) faster than
previous systems, by a factor of ca. 2-3. This is due to efficient copy
handling of grammar repetitions, and because materialization is totally
avoided. In order to compare with twig join implementations, we implemented a
materializer which writes out pre-order numbers of result nodes, and show its
competitiveness.Comment: 13 page
Efficient creation and incremental maintenance of the hopi index for complex xml document collections
The HOPI index, a connection index for XML documents based on the concept of a 2–hop cover, provides space – and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates
DescribeX: A Framework for Exploring and Querying XML Web Collections
This thesis introduces DescribeX, a powerful framework that is capable of
describing arbitrarily complex XML summaries of web collections, providing
support for more efficient evaluation of XPath workloads. DescribeX permits the
declarative description of document structure using all axes and language
constructs in XPath, and generalizes many of the XML indexing and summarization
approaches in the literature. DescribeX supports the construction of
heterogeneous summaries where different document elements sharing a common
structure can be declaratively defined and refined by means of path regular
expressions on axes, or axis path regular expression (AxPREs). DescribeX can
significantly help in the understanding of both the structure of complex,
heterogeneous XML collections and the behaviour of XPath queries evaluated on
them.
Experimental results demonstrate the scalability of DescribeX summary
refinements and stabilizations (the key enablers for tailoring summaries) with
multi-gigabyte web collections. A comparative study suggests that using a
DescribeX summary created from a given workload can produce query evaluation
times orders of magnitude better than using existing summaries. DescribeX's
light-weight approach of combining summaries with a file-at-a-time XPath
processor can be a very competitive alternative, in terms of performance, to
conventional fully-fledged XML query engines that provide DB-like functionality
such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page
관계형 RDF 저장소에서 그래프 구조적 정보를 사용한 질의 최적화 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 김형주.As the size of Resource Description Framework (RDF) graphs has grown rapidly, SPARQL query processing on the large-scale RDF graph has become a more challenging problem. For efficient SPARQL query processing, the handling of the intermediate results is the most crucial element because it generally involves many join operators. In order to address this problem, we ropose the triple filtering method that exploits the graph-structural information of RDF data. We design the RDF Path index (RP-index) and the RDF Graph index (RGindex) for the triple filtering. These two indices uses the path information and the graph information of the RDF graph, respectively. However, these indices have the size problem due to the exponential number of the indexed patterns.
We address the size problem by indexing only effective the path and graph patterns for the triple filtering. The triple filtering is performed very efficiently by a relational operator called the RDF Filter (RFLT) with little overhead compared
to the original query processing. Through comprehensive experiments on large-scale RDF datasets, we demonstrate that our approaches can effectively and efficiently reduce the number of redundant intermediate results and
improve the query performance.A Query Optimization Technique using Graph-Structural Information in Relational RDF Stores
Chapter 1 Introduction 1
1.1 Research Motivation 3
1.2 Our Contributions 6
1.3 Outline 11
Chapter 2 Related Work 13
2.1 RDF Stores 13
2.1.1 Summary of Existing Methods of Relation-based RDF Stores 16
2.1.2 Overview of RDF-3X 18
2.2 Handling the Intermediate Results 20
2.3 Path-based and Graph Indices 21
2.4 Frequent Graph Pattern Mining 23
Chapter 3 Preliminaries 25
3.1 RDF and SPARQL 25
3.2 Path and Graph Pattern 29
3.2.1 Incoming Predicate Path 29
3.2.2 k-neighborhood Subgraph 30
3.3 Candidate Vertex Set 31
Chapter 4 R3F: RDF Triple Filtering Framework using RP-index 35
4.1 Motivating Example 35
4.2 Overall Process of R3F 37
4.3 RP-index Definition 38
4.3.1 Physical Structure of RP-index 39
4.3.2 Discriminative and Frequent Predicate Paths 40
4.3.3 Reverse Predicate 42
4.3.4 Handling Other Types of Queries 45
4.3.5 Determining RP-index Parameters 46
4.4 Processing Triple Filtering 47
4.4.1 RFLT Operator 47
4.5 Generating an Execution Plan with RFLT Operators 52
4.5.1 Filtering Effect of Vlists 54
4.5.2 Cardinality of RFLT Operator 55
4.5.3 Generating an Execution Plan 57
4.6 RP-index Building 59
4.6.1 Complexity of building RP-index 63
4.6.2 Parallel Building Methods 63
4.6.3 Incremental Maintenance 65
4.7 Experimental Results 68
4.7.1 RP-index Size 70
4.7.2 Query Evaluation Performance 73
4.7.3 Incremental Maintenance of RP-index 78
Chapter 5 RG-index: RDF Triple Filtering using the Graph Index 87
5.1 Motivating Example 87
5.2 Design of RG-index 90
5.2.1 Physical Structure of RG-index 92
5.3 Handling the Size Problem of RG-index 96
5.3.1 Discriminative Patterns 96
5.3.2 Frequent Patterns 97
5.4 Building RG-index 98
5.4.1 Overview of gSpan 98
5.4.2 RDF Graph Pattern Mining using gSpan 99
5.4.3 Complexity of building RG-index 106
5.5 Triple Filtering using RG-index 106
5.5.1 Generating an Execution Plan with RFLT Operators 107
5.6 Experimental Results 109
5.6.1 RG-index Size 111
5.6.2 Query Evaluation Performance 112
5.6.3 Index Building Time 116
Chapter 6 Conclusion and Future Work 119
6.1 Future Work 120
Appendices 125
Chapter A Related Open Source Projects 125
A.1 RDF-3X 125
A.2 gSpan 129
Chapter B Data Structure of RP-index and RG-index 133
B.1 RP-index 133
B.2 RG-index 134
Chapter C Query Sets 137Docto