387 research outputs found
Multi-level bitmap indexes for flash memory storage
Due to their low access latency, high read speed, and power-efficient operation, flash memory storage devices are rapidly emerging as an attractive alternative to traditional magnetic storage devices. However, tests show that the most efficient indexing methods are not able to take advantage of the flash memory storage devices. In this paper, we present a set of multi-level bitmap indexes that can effectively take advantage of flash storage devices. These indexing methods use coarsely binned indexes to answer queries approximately, and then use finely binned indexes to refine the answers. Our new methods read significantly lower volumes of data at the expense of an increased disk access count, thus taking full advantage of the improved read speed and low access latency of flash devices. To demonstrate the advantage of these new indexes, we measure their performance on a number of storage systems using a standard data warehousing benchmark called the Set Query Benchmark. We observe that multi-level strategies on flash drives are up to 3 times faster than traditional indexing strategies on magnetic disk drives
A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance
This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics
Column Imprints: A Secondary Index Structure
Large scale data warehouses rely heavily on secondary indexes,
such as bitmaps and b-trees, to limit access to slow IO devices.
However, with the advent of large main memory systems, cache
conscious secondary indexes are needed to improve also the transfer
bandwidth between memory and cpu. In this paper, we introduce
column imprint, a simple but efficient cache conscious secondary
index. A column imprint is a collection of many small bit
vectors, each indexing the data points of a single cacheline. An
imprint is used during query evaluation to limit data access and
thus minimize memory traffic. The compression for imprints is
cpu friendly and exploits the empirical observation that data often
exhibits local clustering or partial ordering as a side-effect of the
construction process. Most importantly, column imprint compression
remains effective and robust even in the case of unclustered
data, while other state-of-the-art solutions fail. We conducted an
extensive experimental evaluation to assess the applicability and
the performance impact of the column imprints. The storage overhead,
when experimenting with real world datasets, is just a few
percent over the size of the columns being indexed. The evaluation
time for over 40000 range queries of varying selectivity revealed
the efficiency of the proposed index compar
Oracle Cost Based Optimizer Correlations
Database systems use optimizers on queries to select execution pathways that are supposed to provide optimal performance. The Oracle database version of this technology is called the Cost Based Optimizer (CBO). Researchers have studied whether Oracle optimizer estimates could be correlated to execution speeds with a high degree of confidence, but have found that correlating optimizer cost estimates with actual execution speed to be problematic and unreliable. If possible, however, such correlations would be helpful to developers who are tasked with query creation and optimization. Although much has been written on databases, the academic literature on optimizers was sparse. To fill the gap, this researcher developed a quantitative research methodology to test query optimization on an Oracle 11g database. Correlations between cached, non-cached, partitioned and non-partitioned table structures and indexes were performed. The findings suggest that confident correlations between optimizer cost estimates and execution speeds are not yet possible. Suggestions for further research were provided
B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives
Previous research addressed the potential problems of the hard-disk oriented
design of DBMSs of flashSSDs. In this paper, we focus on exploiting potential
benefits of flashSSDs. First, we examine the internal parallelism issues of
flashSSDs by conducting benchmarks to various flashSSDs. Then, we suggest
algorithm-design principles in order to best benefit from the internal
parallelism. We present a new I/O request concept, called psync I/O that can
exploit the internal parallelism of flashSSDs in a single process. Based on
these ideas, we introduce B+-tree optimization methods in order to utilize
internal parallelism. By integrating the results of these methods, we present a
B+-tree variant, PIO B-tree. We confirmed that each optimization method
substantially enhances the index performance. Consequently, PIO B-tree enhanced
B+-tree's insert performance by a factor of up to 16.3, while improving
point-search performance by a factor of 1.2. The range search of PIO B-tree was
up to 5 times faster than that of the B+-tree. Moreover, PIO B-tree
outperformed other flash-aware indexes in various synthetic workloads. We also
confirmed that PIO B-tree outperforms B+-tree in index traces collected inside
the Postgresql DBMS with TPC-C benchmark.Comment: VLDB201
Adaptive Merging on Phase Change Memory
Indexing is a well-known database technique used to facilitate data access
and speed up query processing. Nevertheless, the construction and modification
of indexes are very expensive. In traditional approaches, all records in the
database table are equally covered by the index. It is not effective, since
some records may be queried very often and some never. To avoid this problem,
adaptive merging has been introduced. The key idea is to create index
adaptively and incrementally as a side-product of query processing. As a
result, the database table is indexed partially depending on the query
workload. This paper faces a problem of adaptive merging for phase change
memory (PCM). The most important features of this memory type are: limited
write endurance and high write latency. As a consequence, adaptive merging
should be investigated from the scratch. We solve this problem in two steps.
First, we apply several PCM optimization techniques to the traditional adaptive
merging approach. We prove that the proposed method (eAM) outperforms a
traditional approach by 60%. After that, we invent the framework for adaptive
merging (PAM) and a new PCM-optimized index. It further improves the system
performance by 20% for databases where search queries interleave with data
modifications
- …