6,193 research outputs found
Using consistent subcuts for detecting stable properties
We present a general protocol for detecting whether a property holds in a distributed system, where the property is a member of a subclass of stable properties we call the locally stable properties. Our protocol is based on a decentralized method for constructing a maximal subset of the local states that are mutually consistent, which in turn is based on a weakened version of vectored time stamps. The structure of our protocol lends itself to refinement, and we demonstrate its utility by deriving some specialized property-detection protocols, including two previously known protocols that are known to be effective
Dynamic sharing of a multiple access channel
In this paper we consider the mutual exclusion problem on a multiple access
channel. Mutual exclusion is one of the fundamental problems in distributed
computing. In the classic version of this problem, n processes perform a
concurrent program which occasionally triggers some of them to use shared
resources, such as memory, communication channel, device, etc. The goal is to
design a distributed algorithm to control entries and exits to/from the shared
resource in such a way that in any time there is at most one process accessing
it. We consider both the classic and a slightly weaker version of mutual
exclusion, called ep-mutual-exclusion, where for each period of a process
staying in the critical section the probability that there is some other
process in the critical section is at most ep. We show that there are channel
settings, where the classic mutual exclusion is not feasible even for
randomized algorithms, while ep-mutual-exclusion is. In more relaxed channel
settings, we prove an exponential gap between the makespan complexity of the
classic mutual exclusion problem and its weaker ep-exclusion version. We also
show how to guarantee fairness of mutual exclusion algorithms, i.e., that each
process that wants to enter the critical section will eventually succeed
Improvements in Hardware Transactional Memory for GPU Architectures
In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level.
After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Dynamic Race Prediction in Linear Time
Writing reliable concurrent software remains a huge challenge for today's
programmers. Programmers rarely reason about their code by explicitly
considering different possible inter-leavings of its execution. We consider the
problem of detecting data races from individual executions in a sound manner.
The classical approach to solving this problem has been to use Lamport's
happens-before (HB) relation. Until now HB remains the only approach that runs
in linear time. Previous efforts in improving over HB such as causally-precedes
(CP) and maximal causal models fall short due to the fact that they are not
implementable efficiently and hence have to compromise on their race detecting
ability by limiting their techniques to bounded sized fragments of the
execution. We present a new relation weak-causally-precedes (WCP) that is
provably better than CP in terms of being able to detect more races, while
still remaining sound. Moreover it admits a linear time algorithm which works
on the entire execution without having to fragment it.Comment: 22 pages, 8 figures, 1 algorithm, 1 tabl
Parallel discrete event simulation: A shared memory approach
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models
- …