27,840 research outputs found
Computational Aspects of Asynchronous CA
This work studies some aspects of the computational power of fully
asynchronous cellular automata (ACA). We deal with some notions of simulation
between ACA and Turing Machines. In particular, we characterize the updating
sequences specifying which are "universal", i.e., allowing a (specific family
of) ACA to simulate any TM on any input. We also consider the computational
cost of such simulations
Intrinsic universality and the computational power of self-assembly
This short survey of recent work in tile self-assembly discusses the use of
simulation to classify and separate the computational and expressive power of
self-assembly models. The journey begins with the result that there is a single
universal tile set that, with proper initialization and scaling, simulates any
tile assembly system. This universal tile set exhibits something stronger than
Turing universality: it captures the geometry and dynamics of any simulated
system. From there we find that there is no such tile set in the
noncooperative, or temperature 1, model, proving it weaker than the full tile
assembly model. In the two-handed or hierarchal model, where large assemblies
can bind together on one step, we encounter an infinite set, of infinite
hierarchies, each with strictly increasing simulation power. Towards the end of
our trip, we find one tile to rule them all: a single rotatable flipable
polygonal tile that can simulate any tile assembly system. It seems this could
be the beginning of a much longer journey, so directions for future work are
suggested.Comment: In Proceedings MCU 2013, arXiv:1309.104
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems
We evaluate optimized parallel sparse matrix-vector operations for several
representative application areas on widespread multicore-based cluster
configurations. First the single-socket baseline performance is analyzed and
modeled with respect to basic architectural properties of standard multicore
chips. Beyond the single node, the performance of parallel sparse matrix-vector
operations is often limited by communication overhead. Starting from the
observation that nonblocking MPI is not able to hide communication cost using
standard MPI implementations, we demonstrate that explicit overlap of
communication and computation can be achieved by using a dedicated
communication thread, which may run on a virtual core. Moreover we identify
performance benefits of hybrid MPI/OpenMP programming due to improved load
balancing even without explicit communication overlap. We compare performance
results for pure MPI, the widely used "vector-like" hybrid programming
strategies, and explicit overlap on a modern multicore-based cluster and a Cray
XE6 system.Comment: 16 pages, 10 figure
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming
We evaluate optimized parallel sparse matrix-vector operations for two
representative application areas on widespread multicore-based cluster
configurations. First the single-socket baseline performance is analyzed and
modeled with respect to basic architectural properties of standard multicore
chips. Going beyond the single node, parallel sparse matrix-vector operations
often suffer from an unfavorable communication to computation ratio. Starting
from the observation that nonblocking MPI is not able to hide communication
cost using standard MPI implementations, we demonstrate that explicit overlap
of communication and computation can be achieved by using a dedicated
communication thread, which may run on a virtual core. We compare our approach
to pure MPI and the widely used "vector-like" hybrid programming strategy.Comment: 12 pages, 6 figure
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system
A number of representation schemes have been presented for use within
learning classifier systems, ranging from binary encodings to neural networks.
This paper presents results from an investigation into using discrete and fuzzy
dynamical system representations within the XCSF learning classifier system. In
particular, asynchronous random Boolean networks are used to represent the
traditional condition-action production system rules in the discrete case and
asynchronous fuzzy logic networks in the continuous-valued case. It is shown
possible to use self-adaptive, open-ended evolution to design an ensemble of
such dynamical systems within XCSF to solve a number of well-known test
problems
Efficient Parallel Algorithm for Statistical Ion Track Simulations in Crystalline Materials
We present an efficient parallel algorithm for statistical Molecular Dynamics
simulations of ion tracks in solids. The method is based on the Rare Event
Enhanced Domain following Molecular Dynamics (REED-MD) algorithm, which has
been successfully applied to studies of, e.g., ion implantation into
crystalline semiconductor wafers. We discuss the strategies for parallelizing
the method, and we settle on a host-client type polling scheme in which a
multiple of asynchronous processors are continuously fed to the host, which, in
turn, distributes the resulting feed-back information to the clients. This
real-time feed-back consists of, e.g., cumulative damage information or
statistics updates necessary for the cloning in the rare event algorithm. We
finally demonstrate the algorithm for radiation effects in a nuclear oxide
fuel, and we show the balanced parallel approach with high parallel efficiency
in multiple processor configurations.Comment: 17 pages, seven figures, four table
- …