23,825 research outputs found
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives
Previous research addressed the potential problems of the hard-disk oriented
design of DBMSs of flashSSDs. In this paper, we focus on exploiting potential
benefits of flashSSDs. First, we examine the internal parallelism issues of
flashSSDs by conducting benchmarks to various flashSSDs. Then, we suggest
algorithm-design principles in order to best benefit from the internal
parallelism. We present a new I/O request concept, called psync I/O that can
exploit the internal parallelism of flashSSDs in a single process. Based on
these ideas, we introduce B+-tree optimization methods in order to utilize
internal parallelism. By integrating the results of these methods, we present a
B+-tree variant, PIO B-tree. We confirmed that each optimization method
substantially enhances the index performance. Consequently, PIO B-tree enhanced
B+-tree's insert performance by a factor of up to 16.3, while improving
point-search performance by a factor of 1.2. The range search of PIO B-tree was
up to 5 times faster than that of the B+-tree. Moreover, PIO B-tree
outperformed other flash-aware indexes in various synthetic workloads. We also
confirmed that PIO B-tree outperforms B+-tree in index traces collected inside
the Postgresql DBMS with TPC-C benchmark.Comment: VLDB201
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
There is a growing need for distributed graph processing systems that are
capable of gracefully scaling to very large graph datasets. Unfortunately, this
challenge has not been easily met due to the intense memory pressure imposed by
process-centric, message passing designs that many graph processing systems
follow. Pregelix is a new open source distributed graph processing system that
is based on an iterative dataflow design that is better tuned to handle both
in-memory and out-of-core workloads. As such, Pregelix offers improved
performance characteristics and scaling properties over current open source
systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up
to 35x speedup compared to distributed GraphLab), and makes more effective use
of available machine resources to support Big(ger) Graph Analytics
- …