1,803 research outputs found
A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue
We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable.
We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms
A Concurrency-Agnostic Protocol for Multi-Paradigm Concurrent Debugging Tools
Today's complex software systems combine high-level concurrency models. Each
model is used to solve a specific set of problems. Unfortunately, debuggers
support only the low-level notions of threads and shared memory, forcing
developers to reason about these notions instead of the high-level concurrency
models they chose.
This paper proposes a concurrency-agnostic debugger protocol that decouples
the debugger from the concurrency models employed by the target application. As
a result, the underlying language runtime can define custom breakpoints,
stepping operations, and execution events for each concurrency model it
supports, and a debugger can expose them without having to be specifically
adapted.
We evaluated the generality of the protocol by applying it to SOMns, a
Newspeak implementation, which supports a diversity of concurrency models
including communicating sequential processes, communicating event loops,
threads and locks, fork/join parallelism, and software transactional memory. We
implemented 21 breakpoints and 20 stepping operations for these concurrency
models. For none of these, the debugger needed to be changed. Furthermore, we
visualize all concurrent interactions independently of a specific concurrency
model. To show that tooling for a specific concurrency model is possible, we
visualize actor turns and message sends separately.Comment: International Symposium on Dynamic Language
Processor Allocation for Optimistic Parallelization of Irregular Programs
Optimistic parallelization is a promising approach for the parallelization of
irregular algorithms: potentially interfering tasks are launched dynamically,
and the runtime system detects conflicts between concurrent activities,
aborting and rolling back conflicting tasks. However, parallelism in irregular
algorithms is very complex. In a regular algorithm like dense matrix
multiplication, the amount of parallelism can usually be expressed as a
function of the problem size, so it is reasonably straightforward to determine
how many processors should be allocated to execute a regular algorithm of a
certain size (this is called the processor allocation problem). In contrast,
parallelism in irregular algorithms can be a function of input parameters, and
the amount of parallelism can vary dramatically during the execution of the
irregular algorithm. Therefore, the processor allocation problem for irregular
algorithms is very difficult.
In this paper, we describe the first systematic strategy for addressing this
problem. Our approach is based on a construct called the conflict graph, which
(i) provides insight into the amount of parallelism that can be extracted from
an irregular algorithm, and (ii) can be used to address the processor
allocation problem for irregular algorithms. We show that this problem is
related to a generalization of the unfriendly seating problem and, by extending
Tur\'an's theorem, we obtain a worst-case class of problems for optimistic
parallelization, which we use to derive a lower bound on the exploitable
parallelism. Finally, using some theoretically derived properties and some
experimental facts, we design a quick and stable control strategy for solving
the processor allocation problem heuristically.Comment: 12 pages, 3 figures, extended version of SPAA 2011 brief announcemen
- …