16,809 research outputs found
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
Best-first heuristic search for multicore machines
To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals
SKIRT: hybrid parallelization of radiative transfer simulations
We describe the design, implementation and performance of the new hybrid
parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which
has been used extensively for modeling the continuum radiation of dusty
astrophysical systems including late-type galaxies and dusty tori. The hybrid
scheme combines distributed memory parallelization, using the standard Message
Passing Interface (MPI) to communicate between processes, and shared memory
parallelization, providing multiple execution threads within each process to
avoid duplication of data structures. The synchronization between multiple
threads is accomplished through atomic operations without high-level locking
(also called lock-free programming). This improves the scaling behavior of the
code and substantially simplifies the implementation of the hybrid scheme. The
result is an extremely flexible solution that adjusts to the number of
available nodes, processors and memory, and consequently performs well on a
wide variety of computing architectures.Comment: 21 pages, 20 figure
Towards Practical Graph-Based Verification for an Object-Oriented Concurrency Model
To harness the power of multi-core and distributed platforms, and to make the
development of concurrent software more accessible to software engineers,
different object-oriented concurrency models such as SCOOP have been proposed.
Despite the practical importance of analysing SCOOP programs, there are
currently no general verification approaches that operate directly on program
code without additional annotations. One reason for this is the multitude of
partially conflicting semantic formalisations for SCOOP (either in theory or
by-implementation). Here, we propose a simple graph transformation system (GTS)
based run-time semantics for SCOOP that grasps the most common features of all
known semantics of the language. This run-time model is implemented in the
state-of-the-art GTS tool GROOVE, which allows us to simulate, analyse, and
verify a subset of SCOOP programs with respect to deadlocks and other
behavioural properties. Besides proposing the first approach to verify SCOOP
programs by automatic translation to GTS, we also highlight our experiences of
applying GTS (and especially GROOVE) for specifying semantics in the form of a
run-time model, which should be transferable to GTS models for other concurrent
languages and libraries.Comment: In Proceedings GaM 2015, arXiv:1504.0244
Locking Local Oscillator Phase to the Atomic Phase via Weak Measurement
We propose a new method to reduce the frequency noise of a Local Oscillator
(LO) to the level of white phase noise by maintaining (not destroying by
projective measurement) the coherence of the ensemble pseudo-spin of atoms over
many measurement cycles. This scheme uses weak measurement to monitor the phase
in Ramsey method and repeat the cycle without initialization of phase and we
call, "atomic phase lock (APL)" in this paper. APL will achieve white phase
noise as long as the noise accumulated during dead time and the decoherence are
smaller than the measurement noise. A numerical simulation confirms that with
APL, Allan deviation is averaged down at a maximum rate that is proportional to
the inverse of total measurement time, tau^-1. In contrast, the current atomic
clocks that use projection measurement suppress the noise only down to the
level of white frequency, in which case Allan deviation scales as tau^-1/2.
Faraday rotation is one of the possible ways to realize weak measurement for
APL. We evaluate the strength of Faraday rotation with 171Yb+ ions trapped in a
linear rf-trap and discuss the performance of APL. The main source of the
decoherence is a spontaneous emission induced by the probe beam for Faraday
rotation measurement. One can repeat the Faraday rotation measurement until the
decoherence become comparable to the SNR of measurement. We estimate this
number of cycles to be ~100 cycles for a realistic experimental parameter.Comment: 18 pages, 7 figures, submitted to New Journal of Physic
- …