5,252 research outputs found

    Dynamic deadlock verification for general barrier synchronisation

    Get PDF
    We present Armus, a dynamic verification tool for deadlock detection and avoidance specialised in barrier synchronisation. Barriers are used to coordinate the execution of groups of tasks, and serve as a building block of parallel computing. Our tool verifies more barrier synchronisation patterns than current state-of-the-art. To improve the scalability of verification, we introduce a novel eventbased representation of concurrency constraints, and a graph-based technique for deadlock analysis. The implementation is distributed and fault-tolerant, and can verify X10 and Java programs. To formalise the notion of barrier deadlock, we introduce a core language expressive enough to represent the three most widespread barrier synchronisation patterns: group, split-phase, and dynamic membership. We propose a graph analysis technique that selects from two alternative graph representations: the Wait-For Graph, that favours programs with more tasks than barriers; and the State Graph, optimised for programs with more barriers than tasks. We prove that finding a deadlock in either representation is equivalent, and that the verification algorithm is sound and complete with respect to the notion of deadlock in our core language. Armus is evaluated with three benchmark suites in local and distributed scenarios. The benchmarks show that graph analysis with automatic graph-representation selection can record a 7-fold execution increase versus the traditional fixed graph representation. The performance measurements for distributed deadlock detection between 64 processes show negligible overheads

    Parallel discrete event simulation: A shared memory approach

    Get PDF
    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models

    An energy-efficient memory unit for clustered microarchitectures

    Get PDF
    Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.Peer ReviewedPostprint (author's final draft

    To boldly go:an occam-π mission to engineer emergence

    Get PDF
    Future systems will be too complex to design and implement explicitly. Instead, we will have to learn to engineer complex behaviours indirectly: through the discovery and application of local rules of behaviour, applied to simple process components, from which desired behaviours predictably emerge through dynamic interactions between massive numbers of instances. This paper describes a process-oriented architecture for fine-grained concurrent systems that enables experiments with such indirect engineering. Examples are presented showing the differing complex behaviours that can arise from minor (non-linear) adjustments to low-level parameters, the difficulties in suppressing the emergence of unwanted (bad) behaviour, the unexpected relationships between apparently unrelated physical phenomena (shown up by their separate emergence from the same primordial process swamp) and the ability to explore and engineer completely new physics (such as force fields) by their emergence from low-level process interactions whose mechanisms can only be imagined, but not built, at the current time
    corecore