29 research outputs found
Cache Serializability: Reducing Inconsistency in Edge Transactions
Read-only caches are widely used in cloud infrastructures to reduce access
latency and load on backend databases. Operators view coherent caches as
impractical at genuinely large scale and many client-facing caches are updated
in an asynchronous manner with best-effort pipelines. Existing solutions that
support cache consistency are inapplicable to this scenario since they require
a round trip to the database on every cache transaction.
Existing incoherent cache technologies are oblivious to transactional data
access, even if the backend database supports transactions. We propose T-Cache,
a novel caching policy for read-only transactions in which inconsistency is
tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache
improves cache consistency despite asynchronous and unreliable communication
between the cache and the database. We define cache-serializability, a variant
of serializability that is suitable for incoherent caches, and prove that with
unbounded resources T-Cache implements this new specification. With limited
resources, T-Cache allows the system manager to choose a trade-off between
performance and consistency.
Our evaluation shows that T-Cache detects many inconsistencies with only
nominal overhead. We use synthetic workloads to demonstrate the efficacy of
T-Cache when data accesses are clustered and its adaptive reaction to workload
changes. With workloads based on the real-world topologies, T-Cache detects
43-70% of the inconsistencies and increases the rate of consistent transactions
by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability:
Reducing Inconsistency in Edge Transactions," Distributed Computing Systems
(ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201
Rethinking serializable multiversion concurrency control
Multi-versioned database systems have the potential to significantly increase
the amount of concurrency in transaction processing because they can avoid
read-write conflicts. Unfortunately, the increase in concurrency usually comes
at the cost of transaction serializability. If a database user requests full
serializability, modern multi-versioned systems significantly constrain
read-write concurrency among conflicting transactions and employ expensive
synchronization patterns in their design. In main-memory multi-core settings,
these additional constraints are so burdensome that multi-versioned systems are
often significantly outperformed by single-version systems.
We propose Bohm, a new concurrency control protocol for main-memory
multi-versioned database systems. Bohm guarantees serializable execution while
ensuring that reads never block writes. In addition, Bohm does not require
reads to perform any book-keeping whatsoever, thereby avoiding the overhead of
tracking reads via contended writes to shared memory. This leads to excellent
scalability and performance in multi-core settings. Bohm has all the above
characteristics without performing validation based concurrency control.
Instead, it is pessimistic, and is therefore not prone to excessive aborts in
the presence of contention. An experimental evaluation shows that Bohm performs
well in both high contention and low contention settings, and is able to
dramatically outperform state-of-the-art multi-versioned systems despite
maintaining the full set of serializability guarantees
Building Scalable and Consistent Distributed Databases Under Conflicts
Distributed databases, which rely on redundant and distributed storage across multiple
servers, are able to provide mission-critical data management services at large scale. Parallelism
is the key to the scalability of distributed databases, but concurrent queries having
conflicts may block or abort each other when strong consistency is enforced using rigorous
concurrency control protocols. This thesis studies the techniques of building scalable distributed
databases under strong consistency guarantees even in the face of high contention
workloads. The techniques proposed in this thesis share a common idea, conflict mitigation,
meaning mitigating conflicts by rescheduling operations in the concurrency control in the first
place instead of resolving contending conflicts. Using this idea, concurrent queries
under conflicts can be executed with high parallelism. This thesis explores this idea on
both databases that support serializable ACID (atomic, consistency, isolation, durability)
transactions, and eventually consistent NoSQL systems.
First, the epoch-based concurrency control (ECC) technique is proposed in ALOHA-KV,
a new distributed key-value store that supports high performance read-only and write-only
distributed transactions. ECC demonstrates that concurrent serializable distributed
transactions can be processed in parallel with low overhead even under high contention.
With ECC, a new atomic commitment protocol is developed that only requires amortized
one round trip for a distributed write-only transaction to commit in the absence of failures.
Second, a novel paradigm of serializable distributed transaction processing is developed
to extend ECC with read-write transaction processing support. This paradigm uses a
newly proposed database operator, functors, which is a placeholder for the value of a key,
which can be computed asynchronously in parallel with other functor computations of the
same or other transactions. Functor-enabled ECC achieves more fine-grained concurrency
control than transaction level concurrency control, and it never aborts transactions due
to read-write or write-write conflicts but allows transactions to fail due to logic errors or
constraint violations while guaranteeing serializability.
Lastly, this thesis explores consistency in the eventually consistent system, Apache
Cassandra, for an investigation of the consistency violation, referred to as "consistency
spikes". This investigation shows that the consistency spikes exhibited by Cassandra are
strongly correlated with garbage collection, particularly the "stop-the-world" phase in the
Java virtual machine. Thus, delaying read operations arti cially at servers immediately
after a garbage collection pause can virtually eliminate these spikes.
All together, these techniques allow distributed databases to provide scalable and
consistent storage service
OLTP on Hardware Islands
Modern hardware is abundantly parallel and increasingly heterogeneous. The
numerous processing cores have non-uniform access latencies to the main memory
and to the processor caches, which causes variability in the communication
costs. Unfortunately, database systems mostly assume that all processing cores
are the same and that microarchitecture differences are not significant enough
to appear in critical database execution paths. As we demonstrate in this
paper, however, hardware heterogeneity does appear in the critical path and
conventional database architectures achieve suboptimal and even worse,
unpredictable performance. We perform a detailed performance analysis of OLTP
deployments in servers with multiple cores per CPU (multicore) and multiple
CPUs per server (multisocket). We compare different database deployment
strategies where we vary the number and size of independent database instances
running on a single server, from a single shared-everything instance to
fine-grained shared-nothing configurations. We quantify the impact of
non-uniform hardware on various deployments by (a) examining how efficiently
each deployment uses the available hardware resources and (b) measuring the
impact of distributed transactions and skewed requests on different workloads.
Finally, we argue in favor of shared-nothing deployments that are topology- and
workload-aware and take advantage of fast on-chip communication between islands
of cores on the same socket.Comment: VLDB201
Scalable and dynamically balanced shared-everything OLTP with physiological partitioning
Scaling the performance of shared-everything transaction processing systems to highly parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). PLP applies logical-only partitioning, maintaining the desired properties of sharedeverything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) that enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch free; an extended design makes heap page accesses thread private as well. Moreover, MRBTrees offer an infrastructure for easy repartitioning and allow us to have a lightweight dynamic load balancing mechanism (DLB) on top of PLP. Profiling a PLP prototype running on different multicore machines shows that it acquires 85 and 68%fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to almost 50 % over the existing systems, while DLB enhances the system with rapid and robust behavior in both detecting and handling load imbalance
Providing Freshness for Cached Data in Unstructured Peer-to-Peer Systems
Replication is a popular technique for increasing data availability and improving perfor- mance in peer-to-peer systems. Maintaining freshness of replicated data is challenging due to the high cost of update management. While updates have been studied in structured networks, they have been neglected in unstructured networks. We therefore confront the problem of maintaining fresh replicas of data in unstructured peer-to-peer networks. We propose techniques that leverage path replication to support efficient lazy updates and provide freshness for cached data in these systems using only local knowledge. In addition, we show that locally available information may be used to provide additional guarantees of freshness at an acceptable cost to performance. Through performance simulations based on both synthetic and real-world workloads from big data environments, we demonstrate the effectiveness of our approach
Enhancing concurrency in distributed transactional memory through commutativity.
Abstract. Distributed software transactional memory is an emerging, alternative concurrency control model for distributed systems promising to alleviate the difficulties of lock-based distributed synchronization. We consider the multi-versioning (MV) model to avoid unnecessary aborts. MV schemes inherently guarantee commits of read-only transactions, but limit the concurrency of write transactions. In this paper we propose CRF (Commutative Requests First), a new scheduler tailored for enhancing concurrency of write transactions. CRF relies on the notion of commutative transactions, namely conflicting transactions that leave the state of the shared data-set consistent even if validated and committed concurrently. CRF is responsible to detect conflicts among commutative and non-commutative write transactions and then schedules them according to the execution state. We assess the goodness of the approach by an extensive evaluation of a fully implementation of CRF. The tests reveal that CRF improves throughput over a state-of-the-art DTM solution
Recommended from our members
Data Management Solutions for Tackling Big Data Variety
Variety is one of the three defining characteristics of Big Data; the others being Volume and Velocity. There are several aspects of this data variety: diversity in data formats (text, video, audio) and structure (relational, graph etc), variety in access methodologies(OLTP, OLAP), and distribution heterogeneity within the workloads (read-heavy, high contention). Data management solutions for modern-day applications need to tackle this variety.This dissertation provides an understanding of the challenges associated with the different elements of variety, and proposes several solutions for efficiently handling its various aspects. First, the dissertation studies the challenges related to variety in data structure and access methodologies, and the resultant heterogeneity at the data infrastructure level. Applications now employ several data-processing engines with different underlying representations, like row, column, graph etc., to process their data. We propose Janus, which introduces a novel data-movement pipeline, which enables the use of different representations to support both high throughput of transactions and diverse analytics, while still ensuring consistent real-time analytics in a scale-out setting. Janus partitions the data at different representations, and allows distributed transactions and diverse partitioning strategies at the representations. Then, we propose Typhon and Cerberus, which define and enforce consistency semantics for application data spread across representations. Second, this dissertation proposes solutions for handling distribution heterogeneity within the workloads. Workloads can have have skewed distribution in terms of operation-type, data access or temporal variation. We propose strongly-consistent quorum reads for Raft-like consensus protocols, which can be utilized to scale read-heavy workloads. For supporting high contention transaction workloads, we integrate an existing dynamic timestamp allocation based concurrency control mechanism in a distributed OLTP setting, and analyze its performance. Third, we study IoT applications, which have to deal with both physical heterogeneity of the sensors, as well asdiverse data-processing demands. We propose a multi-representation based architecture catering to IoT applications, and also present the initial design of M-stream, a computation framework for enabling integration and monitoring of uncertain data from multiplesensors. Through analysis, illustrative examples and extensive evaluation of the proposed protocols, this dissertation demonstrates that the proposed solutions can be employed for efficiently handling the different aspects of variety of data-intensive applications
Fast Distributed Transactions for Partitioned Database Systems.
ABSTRACT Many distributed storage systems achieve high data access throughput via partitioning and replication, each system with its own advantages and tradeoffs. In order to achieve high scalability, however, today's systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions. Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes, Calvin supports disk-based storage, scales near-linearly on a cluster of commodity machines, and has no single point of failure. By replicating transaction inputs rather than effects, Calvin is also able to support multiple consistency levels-including Paxosbased strong consistency across geographically distant replicas-at no cost to transactional throughput