24 research outputs found
Improving OLTP Scalability Using Speculative Lock Inheritance
Transaction processing workloads provide ample request level concurrency which highly parallel architectures can exploit. However, the resulting heavy utilization of core database services also causes resource contention within the database engine itself and limits scalability. Meanwhile, many database workloads consist of short transactions which access only a few database records each, often with stringent response time requirements. Performance of these short transactions is determined largely by the amount of overhead the database engine imposes for services such as logging, locking, and transaction management. This paper highlights the negative scalability impact of database locking, an effect which is especially severe for short transactions running on highly concurrent multicore hardware. We propose and evaluate Speculative Lock Inheritance, a technique where hot database locks pass directly from transaction to transaction, bypassing the lock manager bottleneck. We implement SLI in the Shore-MT storage manager and show that lock inheritance fundamentally improves scalability by decoupling the number of simultaneous requests for popular locks from the number of threads in the system, eliminating contention within the lock manager even as core counts continue to increase. We achieve this effect with only minor changes to the lock manager and without changes to consistency or other application-visible effects
Toward Scalable Transaction Processing -- Evolution of Shore-MT
Designing scalable transaction processing systems on modern multicore hardware has been a challenge for almost a decade. The typical characteristics of transaction processing workloads lead to a high degree of unbounded communication on multicores for conventional system designs. In this tutorial, we initially present a systematic way of eliminating scalability bottlenecks of a transaction processing system, which is based on minimizing unbounded communication. Then, we show several techniques that apply the presented methodology to minimize logging, locking, latching etc. related bottlenecks of transaction processing systems. In parallel, we demonstrate the internals of the Shore-MT storage manager and how they have evolved over the years in terms of scalability on multicore hardware through such techniques. We also teach how to use Shore-MT with the various design options it offers through its sophisticated application layer Shore-Kits and simple Metadata Frontend
Applying HTM to an OLTP System: No Free Lunch
Transactional memory is a promising way for implementing efficient synchronization mechanisms for multicore processors. Intel's introduction of hardware transactional memory (HTM) into their Haswell line of processors marks an important step toward mainstream availability of transactional memory. Transaction processing systems require execution of dozens of critical sections to insure isolation among threads, which makes them one of the target applications for exploiting HTM. In this study, we quantify the opportunities and limitations of directly applying HTM to an existing OLTP system that uses fine-grained synchronization. Our target is Shore-MT, a modern multithreaded transactional storage manager that uses a variety of fine-grained synchronization mechanisms to provide scalability on multicore processors. We find that HTM can improve performance of the TATP workload by 13-17% when applied judiciously. However, attempting to replace all synchronization reduces performance compared to the baseline case due to high percentage of aborts caused by the limitations of the current HTM implementation. Copyright 2015 ACM
OLTP on Hardware Islands
Modern hardware is abundantly parallel and increasingly heterogeneous. The
numerous processing cores have non-uniform access latencies to the main memory
and to the processor caches, which causes variability in the communication
costs. Unfortunately, database systems mostly assume that all processing cores
are the same and that microarchitecture differences are not significant enough
to appear in critical database execution paths. As we demonstrate in this
paper, however, hardware heterogeneity does appear in the critical path and
conventional database architectures achieve suboptimal and even worse,
unpredictable performance. We perform a detailed performance analysis of OLTP
deployments in servers with multiple cores per CPU (multicore) and multiple
CPUs per server (multisocket). We compare different database deployment
strategies where we vary the number and size of independent database instances
running on a single server, from a single shared-everything instance to
fine-grained shared-nothing configurations. We quantify the impact of
non-uniform hardware on various deployments by (a) examining how efficiently
each deployment uses the available hardware resources and (b) measuring the
impact of distributed transactions and skewed requests on different workloads.
Finally, we argue in favor of shared-nothing deployments that are topology- and
workload-aware and take advantage of fast on-chip communication between islands
of cores on the same socket.Comment: VLDB201
How to Stop Under-Utilization and Love Multicores
Designing scalable transaction processing systems on modern hardware has been a challenge for almost a decade. Hardware trends oblige software to overcome three major challenges against systems scalability: (1) Exploiting the abundant thread-level parallelism provided by multicores, (2) Achieving predictively efficient execution despite the variability in communication latencies among cores on multisocket multicores, and (3) Taking advantage of the aggressive micro-architectural features. In this tutorial, we shed light on the above three challenges and survey recent proposals to alleviate them. First, we present a systematic way of eliminating scalability bottlenecks based on minimizing unbounded communication and show several techniques that apply the presented methodology to minimize bottlenecks in major components of transaction processing systems. Then, we analyze the problems that arise from the non-uniform nature of communication latencies on modern multisockets and ways to address them for systems that already scale well on multicores. Finally, we examine the sources of under-utilization within a modern processor and present insights and techniques to better exploit the micro-architectural resources of a processor by improving cache locality at the right level
Rethinking serializable multiversion concurrency control
Multi-versioned database systems have the potential to significantly increase
the amount of concurrency in transaction processing because they can avoid
read-write conflicts. Unfortunately, the increase in concurrency usually comes
at the cost of transaction serializability. If a database user requests full
serializability, modern multi-versioned systems significantly constrain
read-write concurrency among conflicting transactions and employ expensive
synchronization patterns in their design. In main-memory multi-core settings,
these additional constraints are so burdensome that multi-versioned systems are
often significantly outperformed by single-version systems.
We propose Bohm, a new concurrency control protocol for main-memory
multi-versioned database systems. Bohm guarantees serializable execution while
ensuring that reads never block writes. In addition, Bohm does not require
reads to perform any book-keeping whatsoever, thereby avoiding the overhead of
tracking reads via contended writes to shared memory. This leads to excellent
scalability and performance in multi-core settings. Bohm has all the above
characteristics without performing validation based concurrency control.
Instead, it is pessimistic, and is therefore not prone to excessive aborts in
the presence of contention. An experimental evaluation shows that Bohm performs
well in both high contention and low contention settings, and is able to
dramatically outperform state-of-the-art multi-versioned systems despite
maintaining the full set of serializability guarantees