193,769 research outputs found
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
Distributed computation of persistent homology
Persistent homology is a popular and powerful tool for capturing topological
features of data. Advances in algorithms for computing persistent homology have
reduced the computation time drastically -- as long as the algorithm does not
exhaust the available memory. Following up on a recently presented parallel
method for persistence computation on shared memory systems, we demonstrate
that a simple adaption of the standard reduction algorithm leads to a variant
for distributed systems. Our algorithmic design ensures that the data is
distributed over the nodes without redundancy; this permits the computation of
much larger instances than on a single machine. Moreover, we observe that the
parallelism at least compensates for the overhead caused by communication
between nodes, and often even speeds up the computation compared to sequential
and even parallel shared memory algorithms. In our experiments, we were able to
compute the persistent homology of filtrations with more than a billion (10^9)
elements within seconds on a cluster with 32 nodes using less than 10GB of
memory per node
- …