2,448 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Byzantine Fault Tolerance for Nondeterministic Applications
All practical applications contain some degree of nondeterminism. When such
applications are replicated to achieve Byzantine fault tolerance (BFT), their
nondeterministic operations must be controlled to ensure replica consistency.
To the best of our knowledge, only the most simplistic types of replica
nondeterminism have been dealt with. Furthermore, there lacks a systematic
approach to handling common types of nondeterminism. In this paper, we propose
a classification of common types of replica nondeterminism with respect to the
requirement of achieving Byzantine fault tolerance, and describe the design and
implementation of the core mechanisms necessary to handle such nondeterminism
within a Byzantine fault tolerance framework.Comment: To appear in the proceedings of the 3rd IEEE International Symposium
on Dependable, Autonomic and Secure Computing, 200
Experience with the Open Source based implementation for ATLAS Conditions Data Management System
Conditions Data in high energy physics experiments is frequently seen as
every data needed for reconstruction besides the event data itself. This
includes all sorts of slowly evolving data like detector alignment, calibration
and robustness, and data from detector control system. Also, every Conditions
Data Object is associated with a time interval of validity and a version.
Besides that, quite often is useful to tag collections of Conditions Data
Objects altogether. These issues have already been investigated and a data
model has been proposed and used for different implementations based in
commercial DBMSs, both at CERN and for the BaBar experiment. The special case
of the ATLAS complex trigger that requires online access to calibration and
alignment data poses new challenges that have to be met using a flexible and
customizable solution more in the line of Open Source components. Motivated by
the ATLAS challenges we have developed an alternative implementation, based in
an Open Source RDBMS. Several issues were investigated land will be described
in this paper:
-The best way to map the conditions data model into the relational database
concept considering what are foreseen as the most frequent queries.
-The clustering model best suited to address the scalability problem.
-Extensive tests were performed and will be described.
The very promising results from these tests are attracting the attention from
the HEP community and driving further developments.Comment: 8 pages, 4 figures, 3 tables, conferenc
Centrally Banked Cryptocurrencies
Current cryptocurrencies, starting with Bitcoin, build a decentralized
blockchain-based transaction ledger, maintained through proofs-of-work that
also generate a monetary supply. Such decentralization has benefits, such as
independence from national political control, but also significant limitations
in terms of scalability and computational cost. We introduce RSCoin, a
cryptocurrency framework in which central banks maintain complete control over
the monetary supply, but rely on a distributed set of authorities, or
mintettes, to prevent double-spending. While monetary policy is centralized,
RSCoin still provides strong transparency and auditability guarantees. We
demonstrate, both theoretically and experimentally, the benefits of a modest
degree of centralization, such as the elimination of wasteful hashing and a
scalable system for avoiding double-spending attacks.Comment: 15 pages, 4 figures, 2 tables in Proceedings of NDSS 201
On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems
A new emerging class of parallel database management systems (DBMS) is
designed to take advantage of the partitionable workloads of on-line
transaction processing (OLTP) applications. Transactions in these systems are
optimized to execute to completion on a single node in a shared-nothing cluster
without needing to coordinate with other nodes or use expensive concurrency
control measures. But some OLTP applications cannot be partitioned such that
all of their transactions execute within a single-partition in this manner.
These distributed transactions access data not stored within their local
partitions and subsequently require more heavy-weight concurrency control
protocols. Further difficulties arise when the transaction's execution
properties, such as the number of partitions it may need to access or whether
it will abort, are not known beforehand. The DBMS could mitigate these
performance issues if it is provided with additional information about
transactions. Thus, in this paper we present a Markov model-based approach for
automatically selecting which optimizations a DBMS could use, namely (1) more
efficient concurrency control schemes, (2) intelligent scheduling, (3) reduced
undo logging, and (4) speculative execution. To evaluate our techniques, we
implemented our models and integrated them into a parallel, main-memory OLTP
DBMS to show that we can improve the performance of applications with diverse
workloads.Comment: VLDB201
Learning a Partitioning Advisor with Deep Reinforcement Learning
Commercial data analytics products such as Microsoft Azure SQL Data Warehouse
or Amazon Redshift provide ready-to-use scale-out database solutions for
OLAP-style workloads in the cloud. While the provisioning of a database cluster
is usually fully automated by cloud providers, customers typically still have
to make important design decisions which were traditionally made by the
database administrator such as selecting the partitioning schemes.
In this paper we introduce a learned partitioning advisor for analytical
OLAP-style workloads based on Deep Reinforcement Learning (DRL). The main idea
is that a DRL agent learns its decisions based on experience by monitoring the
rewards for different workloads and partitioning schemes. We evaluate our
learned partitioning advisor in an experimental evaluation with different
databases schemata and workloads of varying complexity. In the evaluation, we
show that our advisor is not only able to find partitionings that outperform
existing approaches for automated partitioning design but that it also can
easily adjust to different deployments. This is especially important in cloud
setups where customers can easily migrate their cluster to a new set of
(virtual) machines
- …