7,107 research outputs found
Improving Simulations of Spiking Neural P Systems in NVIDIA CUDA GPUs: CuSNP
Spiking neural P systems (in short, SN P systems) are parallel models of
computations inspired by the spiking ( ring) of biological neurons. In SN P systems, neurons
function as spike processors and are placed on nodes of a directed graph. Synapses,
the connections between neurons, are represented by arcs or directed endges in the graph.
Not only do SN P systems have parallel semantics (i.e. neurons operate in parallel), but
their structure as directed graphs allow them to be represented as vectors or matrices.
Such representations allow the use of linear algebra operations for simulating the
evolution of the system con gurations, i.e. computations. In this work, we continue the
implementations of SN P systems with delays, i.e. a delay is associated with the sending
of a spike from a neuron to its neighbouring neurons. Our implementation is based on
a modi ed representation of SN P systems as vectors and matrices for SN P systems
without delays. We us massively parallel processors known as graphics processing units
(in short, GPUs) from NVIDIA. For experimental validation, we use SN P systems implementing
generalized sorting networks. We report a speedup, i.e. the ratio between the
running time of the sequential over the parallel simulator, of up to approximately 51
times for a 512-size input to the sorting network
Learning Parallel Computations with ParaLab
In this paper, we present the ParaLab teachware system, which can be used for learning the parallel computation methods. ParaLab provides the tools for simulating the multiprocessor computational systems with various network topologies, for carrying out the computational experiments in the simulation mode, and for evaluating the efficiency of the parallel computation methods. The visual presentation of the parallel computations taking place in the computational experiments is the key feature of the system. ParaLab can be used for the laboratory training within various teaching courses in the field of parallel, distributed, and supercomputer computations
A Lower Bound Technique for Communication in BSP
Communication is a major factor determining the performance of algorithms on
current computing systems; it is therefore valuable to provide tight lower
bounds on the communication complexity of computations. This paper presents a
lower bound technique for the communication complexity in the bulk-synchronous
parallel (BSP) model of a given class of DAG computations. The derived bound is
expressed in terms of the switching potential of a DAG, that is, the number of
permutations that the DAG can realize when viewed as a switching network. The
proposed technique yields tight lower bounds for the fast Fourier transform
(FFT), and for any sorting and permutation network. A stronger bound is also
derived for the periodic balanced sorting network, by applying this technique
to suitable subnetworks. Finally, we demonstrate that the switching potential
captures communication requirements even in computational models different from
BSP, such as the I/O model and the LPRAM
Online Permutation Routing in Partitioned Optical Passive Star Networks
This paper establishes the state of the art in both deterministic and
randomized online permutation routing in the POPS network. Indeed, we show that
any permutation can be routed online on a POPS network either with
deterministic slots, or, with high probability, with
randomized slots, where constant
. When , that we claim to be the
"interesting" case, the randomized algorithm is exponentially faster than any
other algorithm in the literature, both deterministic and randomized ones. This
is true in practice as well. Indeed, experiments show that it outperforms its
rivals even starting from as small a network as a POPS(2,2), and the gap grows
exponentially with the size of the network. We can also show that, under proper
hypothesis, no deterministic algorithm can asymptotically match its
performance
NBODY6++GPU: Ready for the gravitational million-body problem
Accurate direct -body simulations help to obtain detailed information
about the dynamical evolution of star clusters. They also enable comparisons
with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a
well-known direct -body code for star clusters, and NBODY6++ is the extended
version designed for large particle number simulations by supercomputers. We
present NBODY6++GPU, an optimized version of NBODY6++ with hybrid
parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large
direct -body simulations, and in particular to solve the million-body
problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as
well as the first results from a simulation of a realistic globular cluster
initially containing a million particles. For million-body simulations,
NBODY6++GPU is times faster than NBODY6 with 320 CPU cores and 32
NVIDIA K20X GPUs. With this computing cluster specification, the simulations of
million-body globular clusters including primordial binaries require
about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table
- …