16,809 research outputs found

    Boosting Multi-Core Reachability Performance with Shared Hash Tables

    Get PDF
    This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

    Best-first heuristic search for multicore machines

    Get PDF
    To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals

    SKIRT: hybrid parallelization of radiative transfer simulations

    Full text link
    We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modeling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behavior of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.Comment: 21 pages, 20 figure

    Towards Practical Graph-Based Verification for an Object-Oriented Concurrency Model

    Get PDF
    To harness the power of multi-core and distributed platforms, and to make the development of concurrent software more accessible to software engineers, different object-oriented concurrency models such as SCOOP have been proposed. Despite the practical importance of analysing SCOOP programs, there are currently no general verification approaches that operate directly on program code without additional annotations. One reason for this is the multitude of partially conflicting semantic formalisations for SCOOP (either in theory or by-implementation). Here, we propose a simple graph transformation system (GTS) based run-time semantics for SCOOP that grasps the most common features of all known semantics of the language. This run-time model is implemented in the state-of-the-art GTS tool GROOVE, which allows us to simulate, analyse, and verify a subset of SCOOP programs with respect to deadlocks and other behavioural properties. Besides proposing the first approach to verify SCOOP programs by automatic translation to GTS, we also highlight our experiences of applying GTS (and especially GROOVE) for specifying semantics in the form of a run-time model, which should be transferable to GTS models for other concurrent languages and libraries.Comment: In Proceedings GaM 2015, arXiv:1504.0244

    Locking Local Oscillator Phase to the Atomic Phase via Weak Measurement

    Full text link
    We propose a new method to reduce the frequency noise of a Local Oscillator (LO) to the level of white phase noise by maintaining (not destroying by projective measurement) the coherence of the ensemble pseudo-spin of atoms over many measurement cycles. This scheme uses weak measurement to monitor the phase in Ramsey method and repeat the cycle without initialization of phase and we call, "atomic phase lock (APL)" in this paper. APL will achieve white phase noise as long as the noise accumulated during dead time and the decoherence are smaller than the measurement noise. A numerical simulation confirms that with APL, Allan deviation is averaged down at a maximum rate that is proportional to the inverse of total measurement time, tau^-1. In contrast, the current atomic clocks that use projection measurement suppress the noise only down to the level of white frequency, in which case Allan deviation scales as tau^-1/2. Faraday rotation is one of the possible ways to realize weak measurement for APL. We evaluate the strength of Faraday rotation with 171Yb+ ions trapped in a linear rf-trap and discuss the performance of APL. The main source of the decoherence is a spontaneous emission induced by the probe beam for Faraday rotation measurement. One can repeat the Faraday rotation measurement until the decoherence become comparable to the SNR of measurement. We estimate this number of cycles to be ~100 cycles for a realistic experimental parameter.Comment: 18 pages, 7 figures, submitted to New Journal of Physic
    corecore