Search CORE

105 research outputs found

Resizable, Scalable, Concurrent Hash Tables

Author: McKenney Paul E.
Triplett Josh
Walpole Jonathan
Publication venue: PDXScholar
Publication date: 01/06/2011
Field of study

We present algorithms for shrinking and expanding a hash table while allowing concurrent, wait-free, linearly scalable lookups. These resize algorithms allow the hash table to maintain constant-time performance as the number of entries grows, and reclaim memory as the number of entries decreases, without delaying or disrupting readers. We implemented our algorithms in the Linux kernel, to test their performance and scalability. Benchmarks show lookup scalability improved 125x over readerwriter locking, and 56% over the current state-of-the-art for Linux, with no performance degradation for lookups during a resize. To achieve this performance, this hash table implementation uses a new concurrent programming methodology known as relativistic programming. In particular, we make use of an existing synchronization primitive which waits for all current readers to finish, with little to no reader overhead; careful use of this primitive allows ordering of updates without read-side synchronization or memory barriers

PDXScholar (Portland State University)

Crafting Concurrent Data Structures

Author: Liu Yujie
Publication venue: Lehigh Preserve
Publication date
Field of study

Concurrent data structures lie at the heart of modern parallel programs. The design and implementation of concurrent data structures can be challenging due to the demand for good performance (low latency and high scalability) and strong progress guarantees. In this dissertation, we enrich the knowledge of concurrent data structure design by proposing new implementations, as well as general techniques to improve the performance of existing ones.The first part of the dissertation present an unordered linked list implementation that supports nonblocking insert, remove, and lookup operations. The algorithm is based on a novel ``enlist\u27\u27 technique that greatly simplifies the task of achieving wait-freedom. The value of our technique is also demonstrated in the creation of other wait-free data structures such as stacks and hash tables.The second data structure presented is a nonblocking hash table implementation which solves a long-standing design challenge by permitting the hash table to dynamically adjust its size in a nonblocking manner. Additionally, our hash table offers strong theoretical properties such as supporting unbounded memory. In our algorithm, we introduce a new ``freezable set\u27\u27 abstraction which allows us to achieve atomic migration of keys during a resize. The freezable set abstraction also enables highly efficient implementations which maximally exploit the processor cache locality. In experiments, we found our lock-free hash table performs consistently better than state-of-the-art implementations, such as the split-ordered list.The third data structure we present is a concurrent priority queue called the ``mound\u27\u27. Our implementations include nonblocking and lock-based variants. The mound employs randomization to reduce contention on concurrent insert operations, and decomposes a remove operation into smaller atomic operations so that multiple remove operations can execute in parallel within a pipeline. In experiments, we show that the mound can provide excellent latency at low thread counts.Lastly, we discuss how hardware transactional memory (HTM) can be used to accelerate existing nonblocking concurrent data structure implementations. We propose optimization techniques that can significantly improve the performance (1.5x to 3x speedups) of a variety of important concurrent data structures, such as binary search trees and hash tables. The optimizations also preserve the strong progress guarantees of the original implementations

Lehigh University: Lehigh Preserve

Non-Blocking Dynamic Unbounded Graphs with Worst-Case Amortized Bounds

Author: Chatterjee Bapi
Manogna Komma
Peri Sathya
Sa Muktikanta
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th International Conference on Principles of Distributed Systems (OPODIS 2021)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

A Concurrency and Time Centered Framework for Certification of Autonomous Space Systems

Author: Dechev Damian
Publication venue
Publication date
Field of study

Future space missions, such as Mars Science Laboratory, suggest the engineering of some of the most complex man-rated autonomous software systems. The present process-oriented certification methodologies are becoming prohibitively expensive and do not reach the level of detail of providing guidelines for the development and validation of concurrent software. Time and concurrency are the most critical notions in an autonomous space system. In this work we present the design and implementation of the first concurrency and time centered framework for product-oriented software certification of autonomous space systems. To achieve fast and reliable concurrent interactions, we define and apply the notion of Semantically Enhanced Containers (SEC). SECs are data structures that are designed to provide the flexibility and usability of the popular ISO C++ STL containers, while at the same time they are hand-crafted to guarantee domain-specific policies, such as conformance to a given concurrency model. The application of nonblocking programming techniques is critical to the implementation of our SEC containers. Lock-free algorithms help avoid the hazards of deadlock, livelock, and priority inversion, and at the same time deliver fast and scalable performance. Practical lock-free algorithms are notoriously difficult to design and implement and pose a number of hard problems such as ABA avoidance, high complexity, portability, and meeting the linearizability correctness requirements. This dissertation presents the design of the first lock-free dynamically resizable array. Our approach o ers a set of practical, portable, lock-free, and linearizable STL vector operations and a fast and space effcient implementation when compared to the alternative lock- and STM-based techniques. Currently, the literature does not offer an explicit analysis of the ABA problem, its relation to the most commonly applied nonblocking programming techniques, and the possibilities for its detection and avoidance. Eliminating the hazards of ABA is left to the ingenuity of the software designer. We present a generic and practical solution to the fundamental ABA problem for lock-free descriptor-based designs. To enable our SEC container with the property of validating domain-specific invariants, we present Basic Query, our expression template-based library for statically extracting semantic information from C++ source code. The use of static analysis allows for a far more efficient implementation of our nonblocking containers than would have been otherwise possible when relying on the traditional run-time based techniques. Shared data in a real-time cyber-physical system can often be polymorphic (as is the case with a number of components part of the Mission Data System's Data Management Services). The use of dynamic cast is important in the design of autonomous real-time systems since the operation allows for a direct representation of the management and behavior of polymorphic data. To allow for the application of dynamic cast in mission critical code, we validate and improve a methodology for constant-time dynamic cast that shifts the complexity of the operation to the compiler's static checker. In a case study that demonstrates the applicability of the programming and validation techniques of our certification framework, we show the process of verification and semantic parallelization of the Mission Data System's (MDS) Goal Networks. MDS provides an experimental platform for testing and development of autonomous real-time flight applications

Texas A&M Repository

Runtime latency detection and analysis

Author: Dagenais Michel R.
Desfossez Julien
Desnoyers Mathieu
Publication venue: 'Wiley'
Publication date: 18/02/2016
Field of study

Detecting latency-related problems in production environments is usually carried out at the application level with custom instrumentation. This is enough to detect high latencies in instrumented applications but does not provide all the information required to understand the source of the latency and is dependent on manually deployed instrumentation. The abnormal latencies usually start in the operating system kernel because of contention on physical resources or locks. Hence, ﬁnding the root cause of a latency may require a kernel trace. This trace can easily represent hundreds of thousands of events per second. In this paper, we propose and evaluate a methodology, efﬁcient algorithms, and concurrent data structures to detect and analyze latency problems that occur at the kernel level. We introduce a new kernel-based approach that enables developers and administrators to efﬁciently track latency problems in production and trigger actions when abnormal conditions are detected. The result of this study is a working scalable latency tracker and an efﬁcient approach to perform stateful tracing in production

Crossref

PolyPublie

CPHASH: A cache-partitioned hash table

Author: Kaashoek M. Frans
Metreveli Zviad
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/11/2011
Field of study

CPHash is a concurrent hash table for multicore processors. CPHash partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partition. CPHash's message passing avoids the need for locks, pipelines batches of asynchronous messages, and packs multiple messages into a single cache line transfer. Experiments on a 80-core machine with 2 hardware threads per core show that CPHash has ~1.6x higher throughput than a hash table implemented using fine-grained locks. An analysis shows that CPHash wins because it experiences fewer cache misses and its cache misses are less expensive, because of less contention for the on-chip interconnect and DRAM. CPServer, a key/value cache server using CPHash, achieves ~5% higher throughput than a key/value cache server that uses a hash table with fine-grained locks, but both achieve better throughput and scalability than memcached. The throughput of CPHash and CPServer also scale near-linearly with the number of cores.Quanta Computer (Firm)National Science Foundation (U.S.). (Award 915164

CiteSeerX

DSpace@MIT

Crossref