66,266 research outputs found
A Hierarchical Architecture with Parallel Comunication for Implementing P Systems
Membrane systems are computational equivalent to Turing machines. However, its distributed and
massively parallel nature obtain polynomial solutions opposite to traditional non-polynomial ones.
Nowadays, developed investigation for implementing membrane systems has not yet reached the massively
parallel character of this computational model. Better published approaches have achieved a distributed
architecture denominated “partially parallel evolution with partially parallel communication” where several
membranes are allocated at each processor, proxys are used to communicate with membranes allocated at
different processors and a policy of access control to the communications is mandatory. With these approaches,
it is obtained processors parallelism in the application of evolution rules and in the internal communication among
membranes allocated inside each processor. Even though, external communications share a common
communication line, needed for the communication among membranes arranged in different processors, are
sequential.
In this work, we present a new hierarchical architecture that reaches external communication parallelism among
processors and substantially increases parallelization in the application of evolution rules and internal
communications. Consequently, necessary time for each evolution step is reduced. With all of that, this new
distributed hierarchical architecture is near to the massively parallel character required by the model
A Parallel Monte Carlo Code for Simulating Collisional N-body Systems
We present a new parallel code for computing the dynamical evolution of
collisional N-body systems with up to N~10^7 particles. Our code is based on
the the Henon Monte Carlo method for solving the Fokker-Planck equation, and
makes assumptions of spherical symmetry and dynamical equilibrium. The
principal algorithmic developments involve optimizing data structures, and the
introduction of a parallel random number generation scheme, as well as a
parallel sorting algorithm, required to find nearest neighbors for interactions
and to compute the gravitational potential. The new algorithms we introduce
along with our choice of decomposition scheme minimize communication costs and
ensure optimal distribution of data and workload among the processing units.
The implementation uses the Message Passing Interface (MPI) library for
communication, which makes it portable to many different supercomputing
architectures. We validate the code by calculating the evolution of clusters
with initial Plummer distribution functions up to core collapse with the number
of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find
that our results are in good agreement with self-similar core-collapse
solutions, and the core collapse times generally agree with expectations from
the literature. Also, we observe good total energy conservation, within less
than 0.04% throughout all simulations. We analyze the performance of the code,
and demonstrate near-linear scaling of the runtime with the number of
processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7.
The runtime reaches a saturation with the addition of more processors beyond
these limits which is a characteristic of the parallel sorting algorithm. The
resulting maximum speedups we achieve are approximately 60x, 100x, and 220x,
respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement
Networks of Evolutionary Processors: Java Implementation of a Threaded Processor
This paper is focused on a parallel JAVA implementation of a processor defined in a Network of
Evolutionary Processors. Processor description is based on JDom, which provides a complete, Java-based
solution for accessing, manipulating, and outputting XML data from Java code. Communication among different
processor to obtain a fully functional simulation of a Network of Evolutionary Processors will be treated in future.
A safe-thread model of processors performs all parallel operations such as rules and filters. A non-deterministic
behavior of processors is achieved with a thread for each rule and for each filter (input and output). Different
results of a processor evolution are shown
A Parallel Tree-SPH code for Galaxy Formation
We describe a new implementation of a parallel Tree-SPH code with the aim to
simulate Galaxy Formation and Evolution. The code has been parallelized using
SHMEM, a Cray proprietary library to handle communications between the 256
processors of the Silicon Graphics T3E massively parallel supercomputer hosted
by the Cineca Supercomputing Center (Bologna, Italy). The code combines the
Smoothed Particle Hydrodynamics (SPH) method to solve hydro-dynamical equations
with the popular Barnes and Hut (1986) tree-code to perform gravity calculation
with a NlogN scaling, and it is based on the scalar Tree-SPH code developed by
Carraro et al(1998)[MNRAS 297, 1021]. Parallelization is achieved distributing
particles along processors according to a work-load criterion. Benchmarks, in
terms of load-balance and scalability, of the code are analyzed and critically
discussed against the adiabatic collapse of an isothermal gas sphere test using
20,000 particles on 8 processors. The code results balanced at more that 95%
level. Increasing the number of processors, the load-balance slightly worsens.
The deviation from perfect scalability at increasing number of processors is
almost negligible up to 32 processors. Finally we present a simulation of the
formation of an X-ray galaxy cluster in a flat cold dark matter cosmology,
using 200,000 particles and 32 processors, and compare our results with Evrard
(1988) P3M-SPH simulations. Additionaly we have incorporated radiative cooling,
star formation, feed-back from SNae of type II and Ia, stellar winds and UV
flux from massive stars, and an algorithm to follow the chemical enrichment of
the inter-stellar medium. Simulations with some of these ingredients are also
presented.Comment: 19 pages, 14 figures, accepted for publication in MNRA
A taxonomy of parallel sorting
TR 84-601In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array
and file sorting algorithms. We analyze the evolution of research on parallel sorting, from the
earliest sorting networks to the shared memory algorithms and the VLSI sorters. In the context
of sorting networks, we describe two fundamental parallel merging schemes - the odd-even and
the bitonic merge. Sorting algorithms have been derived from these merging algorithms for parallel
computers where processors communicate through interconnection networks such as the perfect
shuffle, the mesh and a number of other sparse networks. After describing the network sorting
algorithms, we show that, with a shared memory model of parallel computation, faster algorithms
have been derived from parallel enumeration sorting schemes, where keys are first ranked and
then rearranged according to their rank
Improving Simulations of Spiking Neural P Systems in NVIDIA CUDA GPUs: CuSNP
Spiking neural P systems (in short, SN P systems) are parallel models of
computations inspired by the spiking ( ring) of biological neurons. In SN P systems, neurons
function as spike processors and are placed on nodes of a directed graph. Synapses,
the connections between neurons, are represented by arcs or directed endges in the graph.
Not only do SN P systems have parallel semantics (i.e. neurons operate in parallel), but
their structure as directed graphs allow them to be represented as vectors or matrices.
Such representations allow the use of linear algebra operations for simulating the
evolution of the system con gurations, i.e. computations. In this work, we continue the
implementations of SN P systems with delays, i.e. a delay is associated with the sending
of a spike from a neuron to its neighbouring neurons. Our implementation is based on
a modi ed representation of SN P systems as vectors and matrices for SN P systems
without delays. We us massively parallel processors known as graphics processing units
(in short, GPUs) from NVIDIA. For experimental validation, we use SN P systems implementing
generalized sorting networks. We report a speedup, i.e. the ratio between the
running time of the sequential over the parallel simulator, of up to approximately 51
times for a 512-size input to the sorting network
Distributed models in P-Systems architectures to reduce computation time
Membrane systems are computational equivalent to Turing machines. However, their distributed and massively parallel nature obtains polynomial solutions opposite to traditional non-polynomial ones. At this point, it is very important to develop dedicated hardware and software implementations exploiting those two membrane systems features. Dealing with distributed implementations of P systems, the bottleneck communication problem has arisen. When the number of membranes grows up, the network gets congested. The purpose of distributed architectures is to reach a compromise between the massively parallel character of the system and the needed evolution step time to transit from one configuration of the system to the next one, solving the bottleneck communication problem. The goal of this paper is twofold. Firstly, to survey in a systematic and uniform way the main results regarding the way membranes can be placed on processors in order to get a software/hardware simulation of P-Systems in a distributed environment. Secondly, we improve some results about the membrane dissolution problem, prove that it is connected, and discuss the possibility of simulating this property in the distributed model. All this yields an improvement in the system parallelism implementation since it gets an increment of the parallelism of the external communication among processors. Proposed ideas improve previous architectures to tackle the communication bottleneck problem, such as reduction of the total time of an evolution step, increase of the number of membranes that could run on a processor and reduction of the number of processors
A Fast Potential and Self-Gravity Solver for Non-Axisymmetric Disks
Disk self-gravity could play an important role in the dynamic evolution of
interaction between disks and embedded protoplanets. We have developed a fast
and accurate solver to calculate the disk potential and disk self-gravity
forces for disk systems on a uniform polar grid. Our method follows closely the
method given by Chan et al. (2006), in which an FFT in the azimuthal direction
is performed and a direct integral approach in the frequency domain in the
radial direction is implemented on a uniform polar grid. This method can be
very effective for disks with vertical structures that depend only on the disk
radius, achieving the same computational efficiency as for zero-thickness
disks. We describe how to parallelize the solver efficiently on distributed
parallel computers. We propose a mode-cutoff procedure to reduce the parallel
communication cost and achieve nearly linear scalability for a large number of
processors. For comparison, we have also developed a particle-based fast
tree-code to calculate the self-gravity of the disk system with vertical
structure. The numerical results show that our direct integral method is at
least two order of magnitudes faster than our optimized tree-code approach.Comment: 8 figures, accepted to ApJ
- …