176,532 research outputs found
Scalable Range Locks for Scalable Address Spaces and Beyond
Range locks are a synchronization construct designed to provide concurrent
access to multiple threads (or processes) to disjoint parts of a shared
resource. Originally conceived in the file system context, range locks are
gaining increasing interest in the Linux kernel community seeking to alleviate
bottlenecks in the virtual memory management subsystem. The existing
implementation of range locks in the kernel, however, uses an internal spin
lock to protect the underlying tree structure that keeps track of acquired and
requested ranges. This spin lock becomes a point of contention on its own when
the range lock is frequently acquired. Furthermore, where and exactly how
specific (refined) ranges can be locked remains an open question.
In this paper, we make two independent, but related contributions. First, we
propose an alternative approach for building range locks based on linked lists.
The lists are easy to maintain in a lock-less fashion, and in fact, our range
locks do not use any internal locks in the common case. Second, we show how the
range of the lock can be refined in the mprotect operation through a
speculative mechanism. This refinement, in turn, allows concurrent execution of
mprotect operations on non-overlapping memory regions. We implement our new
algorithms and demonstrate their effectiveness in user-space and kernel-space,
achieving up to 9 speedup compared to the stock version of the Linux
kernel. Beyond the virtual memory management subsystem, we discuss other
applications of range locks in parallel software. As a concrete example, we
show how range locks can be used to facilitate the design of scalable
concurrent data structures, such as skip lists.Comment: 17 pages, 9 figures, Eurosys 202
Modern Concurrency Techniques: An Exploration
In this thesis, we investigate some of the options programmers have when writing a concurrent program. We explore the use of manually created threads, thread-pools, actors, and Software Transactional Memory. We use these techniques to implement case studies of various kinds: a video game, a physical simulation, an image-processing application, and a concurrent data structure. Through-out these case studies, we notice a common thread: concurrency, applied correctly, can improve the performance of a program—but the correct application may not be readily apparent. Concurrency is an important tool in the toolbox of the modern programmer, especially with the rise of multi-core architectures and the increasing prevalence of distributed systems. And like any tool, it is important to understand how and when to use it
A compiler approach to scalable concurrent program design
The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to
separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined
abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The
transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same
transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support.
The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This
toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory
multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial
applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics
Implementing atomic actions in Ada 95
Atomic actions are an important dynamic structuring technique that aid the construction of fault-tolerant concurrent systems. Although they were developed some years ago, none of the well-known commercially-available programming languages directly support their use. This paper summarizes software fault tolerance techniques for concurrent systems, evaluates the Ada 95 programming language from the perspective of its support for software fault tolerance, and shows how Ada 95 can be used to implement software fault tolerance techniques. In particular, it shows how packages, protected objects, requeue, exceptions, asynchronous transfer of control, tagged types, and controlled types can be used as building blocks from which to construct atomic actions with forward and backward error recovery, which are resilient to deserter tasks and task abortion
Healing replicas in a software component replication system
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaReplication is a key technique for improving performance, availability and faulttolerance
of systems. Replicated systems exist in different settings – from large georeplicated
cloud systems, to replicated databases running in multi-core machines. One
feature that it is often important is a mechanism to verify that replica contents continue in-sync, despite any problem that may occur – e.g. silent bugs that corrupt service state.
Traditional techniques for summarizing service state require that the internal service state is exactly the same after executing the same set of operation. However, for many applications this does not occur, especially if operations are allowed to execute in different orders or if different implementations are used in different replicas.
In this work we propose a new approach for summarizing and recovering the state of
a replicated service. Our approach is based on a novel data structure, Scalable Counting
Bloom Filter. This data structure combines the ideas in Counting Bloom Filters and Scalable Bloom Filters to create a Bloom Filter variant that allow both delete operation and the size of the structure to grow, thus adapting to size of any service state.
We propose an approach to use this data structure to summarize the state of a replicated service, while allowing concurrent operations to execute. We further propose a strategy to recover replicas in a replicated system and describe how to implement our proposed solution in two in-memory databases: H2 and HSQL. The results of evaluation show that our approach can compute the same summary when executing the same set of operation in both databases, thus allowing our solution to be used in diverse replication scenarios. Results also show that additional work on performance optimization is necessary to make our solution practical
A Template for Implementing Fast Lock-free Trees Using HTM
Algorithms that use hardware transactional memory (HTM) must provide a
software-only fallback path to guarantee progress. The design of the fallback
path can have a profound impact on performance. If the fallback path is allowed
to run concurrently with hardware transactions, then hardware transactions must
be instrumented, adding significant overhead. Otherwise, hardware transactions
must wait for any processes on the fallback path, causing concurrency
bottlenecks, or move to the fallback path. We introduce an approach that
combines the best of both worlds. The key idea is to use three execution paths:
an HTM fast path, an HTM middle path, and a software fallback path, such that
the middle path can run concurrently with each of the other two. The fast path
and fallback path do not run concurrently, so the fast path incurs no
instrumentation overhead. Furthermore, fast path transactions can move to the
middle path instead of waiting or moving to the software path. We demonstrate
our approach by producing an accelerated version of the tree update template of
Brown et al., which can be used to implement fast lock-free data structures
based on down-trees. We used the accelerated template to implement two
lock-free trees: a binary search tree (BST), and an (a,b)-tree (a
generalization of a B-tree). Experiments show that, with 72 concurrent
processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many
operations per second as an implementation obtained using the original tree
update template
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
- …