229 research outputs found
Persistent Data Structures for Incremental Join Indices
Join indices are used in relational databases to make join operations faster. Join indices essentially materialise the results of join operations and so accrue maintenance cost, which makes them more suitable for use cases where modifications are rare and joins are performed frequently. To make the maintenance cost lower incrementally updating existing indices is to be preferred.
The usage of persistent data structures for the join indices were explored. Motivation for this research was the ability of persistent data structures to construct multiple partially different versions of the same data structure memory efficiently. This is useful, because there can exist different versions of join indices simultaneously due to usage of multi-version concurrency control (MVCC) in a database. The techniques used in Relaxed Radix Balanced Trees (RRB-Trees) persistent data structure were found promising, but none of the popular implementations were found directly suitable for the use case.
This exploration was done from the context of a particular proprietary embedded in-memory columnar multidimensional database called FastormDB developed by RELEX Solutions. This focused the research into Java Virtual Machine (JVM) based data structures as the implementation of FastormDB is in Java. Multiple persistent data-structures made for the thesis and ones from Scala, Clojure and Paguro were evaluated with Java Microbenchmark Harness (JMH) and Java Object Layout (JOL) based benchmarks and their results analysed via visualisations
Associative access in persistent object stores : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University
Page 276 missing from original copy.The overall aim of the thesis is to study associative access in a Persistent Object Store (POS) providing necessary object storage and retrieval capabilities to an Object Oriented Database System (OODBS) (Delis, Kanitkar & Kollios, 1998 cited in Kirchberg & Tretiakov, 2002). Associative access in an OODBS often includes navigational access to referenced or referencing objects of the object being accessed (Kim. Kim. & Dale. 1989). The thesis reviews several existing approaches proposed to support associative and navigational access in an OODBS. It was found that the existing approaches proposed for associative access could not perform well when queries involve multiple paths or inheritance hierarchies. The thesis studies how associative access can be supported in a POS regardless of paths or inheritance hierarchies involved with a query. The thesis proposes extensions to a model of a POS such that approaches that are proposed for navigational access can be used to support associative access in the extended POS. The extensions include (1) approaches to cluster storage objects in a POS on their storage classes or values of attributes, and (2) approaches to distinguish references between storage objects in a POS based on criteria such as reference types - inheritance and association, storage classes of referenced storage objects or referencing storage objects, and reference names. The thesis implements Matrix-Index Coding (MIC) approach with the extended POS by several coding techniques. The implementation demonstrates that (1) a model of a POS extended by proposed extensions is capable of supporting associative access in an OODBS and (2) the MIC implemented with the extended POS can support a query that requires associative access in an OODBS and involves multiple paths or inheritance hierarchies. The implementation also provides proof of the concepts suggested by Kirchberg & Tretiakov (2002) that (1) the MIC can be made independent from a coding technique, and (2) data compression techniques should be considered as appropriate alternatives to implement the MIC because they could reduce the storage size required
Efficient Processing of Spatial Joins Using R-Trees
Abstract: In this paper, we show that spatial joins are very suitable to be processed on a parallel hardware platform. The parallel system is equipped with a so-called shared virtual memory which is well-suited for the design and implementation of parallel spatial join algorithms. We start with an algorithm that consists of three phases: task creation, task assignment and parallel task execu-tion. In order to reduce CPU- and I/O-cost, the three phases are processed in a fashion that pre-serves spatial locality. Dynamic load balancing is achieved by splitting tasks into smaller ones and reassigning some of the smaller tasks to idle processors. In an experimental performance compar-ison, we identify the advantages and disadvantages of several variants of our algorithm. The most efficient one shows an almost optimal speed-up under the assumption that the number of disks is sufficiently large. Topics: spatial database systems, parallel database systems
Combining Indexing Schemes to Accelerate Querying XML on Content and Structure
This paper presents the advantages of combining multiple document representation schemes for query processing of XML queries on content and structure. We show how extending the Text Region approach [2] with the main features of the Binary Relation approach developed in [8] leads to a considerable speed-up in the processing of the XPath location steps. We detail how, by using the combined scheme, we reduce the number of structural joins used to process the XPath steps, while simultaneously limiting the amount of memory usage. We discuss optimisation strategies enabled by the new `combined representation scheme'. Experiments comparing the efficiency of alternative query processing strategies on a subset of the queries used at INEX 2003 (the Initiative for the Evaluation of XML Retrieval [4]) demonstrate a favourable performance for the combined indexing scheme
Recommended from our members
On the Cost of Transitive Closures in Relational Databases
We consider the question of taking transitive closures on top of pure relational systems (Sybase and Ingres in this case). We developed three kinds of transitive closure programs, one using a stored procedure to simulate a built-in transitive closure operator, one using the C language embedded with SQL statements to simulate the iterated execution of the transitive closure operation, and one using Floyd's matrix algorithm to compute the transitive closure of an input graph. By comparing and analyzing the respective performances of their different versions in terms of elapsed time spent on taking the transitive closure, we identify some of the bottlenecks that arise when defining the transitive closure operator on top of existing relational systems. The main purpose of the work is to estimate the costs of taking transitive closures on top of relational systems, isolate the different cost factors (such as logging, network transmission cost, etc.), and identify some necessary enhancements to existing relational systems in order to support transitive closure operation efficiently. We argue that relational databases should be augmented with efficient transitive closure operators if such queries are made frequently
On the Selection of Optimal Index Configuration in OO Databases
An operation in object-oriented databases gives rise to the processing of a path. Several database operations may result into the same path. The authors address the problem of optimal index configuration for a single path. As it is shown an optimal index configuration for a path can be achieved by splitting the path into subpaths and by indexing each subpath with the optimal index organization. The authors present an algorithm which is able to select an optimal index configuration for a given path. The authors consider a limited number of existing indexing techniques (simple index, inherited index, nested inherited index, multi-index, and multi-inherited index) but the principles of the algorithm remain the same adding more indexing technique
- âŠ