66,254 research outputs found

    A Hierarchical Architecture with Parallel Comunication for Implementing P Systems

    Get PDF
    Membrane systems are computational equivalent to Turing machines. However, its distributed and massively parallel nature obtain polynomial solutions opposite to traditional non-polynomial ones. Nowadays, developed investigation for implementing membrane systems has not yet reached the massively parallel character of this computational model. Better published approaches have achieved a distributed architecture denominated “partially parallel evolution with partially parallel communication” where several membranes are allocated at each processor, proxys are used to communicate with membranes allocated at different processors and a policy of access control to the communications is mandatory. With these approaches, it is obtained processors parallelism in the application of evolution rules and in the internal communication among membranes allocated inside each processor. Even though, external communications share a common communication line, needed for the communication among membranes arranged in different processors, are sequential. In this work, we present a new hierarchical architecture that reaches external communication parallelism among processors and substantially increases parallelization in the application of evolution rules and internal communications. Consequently, necessary time for each evolution step is reduced. With all of that, this new distributed hierarchical architecture is near to the massively parallel character required by the model

    A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

    Full text link
    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement

    Networks of Evolutionary Processors: Java Implementation of a Threaded Processor

    Get PDF
    This paper is focused on a parallel JAVA implementation of a processor defined in a Network of Evolutionary Processors. Processor description is based on JDom, which provides a complete, Java-based solution for accessing, manipulating, and outputting XML data from Java code. Communication among different processor to obtain a fully functional simulation of a Network of Evolutionary Processors will be treated in future. A safe-thread model of processors performs all parallel operations such as rules and filters. A non-deterministic behavior of processors is achieved with a thread for each rule and for each filter (input and output). Different results of a processor evolution are shown

    A Parallel Tree-SPH code for Galaxy Formation

    Get PDF
    We describe a new implementation of a parallel Tree-SPH code with the aim to simulate Galaxy Formation and Evolution. The code has been parallelized using SHMEM, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Supercomputing Center (Bologna, Italy). The code combines the Smoothed Particle Hydrodynamics (SPH) method to solve hydro-dynamical equations with the popular Barnes and Hut (1986) tree-code to perform gravity calculation with a NlogN scaling, and it is based on the scalar Tree-SPH code developed by Carraro et al(1998)[MNRAS 297, 1021]. Parallelization is achieved distributing particles along processors according to a work-load criterion. Benchmarks, in terms of load-balance and scalability, of the code are analyzed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 20,000 particles on 8 processors. The code results balanced at more that 95% level. Increasing the number of processors, the load-balance slightly worsens. The deviation from perfect scalability at increasing number of processors is almost negligible up to 32 processors. Finally we present a simulation of the formation of an X-ray galaxy cluster in a flat cold dark matter cosmology, using 200,000 particles and 32 processors, and compare our results with Evrard (1988) P3M-SPH simulations. Additionaly we have incorporated radiative cooling, star formation, feed-back from SNae of type II and Ia, stellar winds and UV flux from massive stars, and an algorithm to follow the chemical enrichment of the inter-stellar medium. Simulations with some of these ingredients are also presented.Comment: 19 pages, 14 figures, accepted for publication in MNRA

    A taxonomy of parallel sorting

    Get PDF
    TR 84-601In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and file sorting algorithms. We analyze the evolution of research on parallel sorting, from the earliest sorting networks to the shared memory algorithms and the VLSI sorters. In the context of sorting networks, we describe two fundamental parallel merging schemes - the odd-even and the bitonic merge. Sorting algorithms have been derived from these merging algorithms for parallel computers where processors communicate through interconnection networks such as the perfect shuffle, the mesh and a number of other sparse networks. After describing the network sorting algorithms, we show that, with a shared memory model of parallel computation, faster algorithms have been derived from parallel enumeration sorting schemes, where keys are first ranked and then rearranged according to their rank

    Improving Simulations of Spiking Neural P Systems in NVIDIA CUDA GPUs: CuSNP

    Get PDF
    Spiking neural P systems (in short, SN P systems) are parallel models of computations inspired by the spiking ( ring) of biological neurons. In SN P systems, neurons function as spike processors and are placed on nodes of a directed graph. Synapses, the connections between neurons, are represented by arcs or directed endges in the graph. Not only do SN P systems have parallel semantics (i.e. neurons operate in parallel), but their structure as directed graphs allow them to be represented as vectors or matrices. Such representations allow the use of linear algebra operations for simulating the evolution of the system con gurations, i.e. computations. In this work, we continue the implementations of SN P systems with delays, i.e. a delay is associated with the sending of a spike from a neuron to its neighbouring neurons. Our implementation is based on a modi ed representation of SN P systems as vectors and matrices for SN P systems without delays. We us massively parallel processors known as graphics processing units (in short, GPUs) from NVIDIA. For experimental validation, we use SN P systems implementing generalized sorting networks. We report a speedup, i.e. the ratio between the running time of the sequential over the parallel simulator, of up to approximately 51 times for a 512-size input to the sorting network

    Distributed models in P-Systems architectures to reduce computation time

    Get PDF
    Membrane systems are computational equivalent to Turing machines. However, their distributed and massively parallel nature obtains polynomial solutions opposite to traditional non-polynomial ones. At this point, it is very important to develop dedicated hardware and software implementations exploiting those two membrane systems features. Dealing with distributed implementations of P systems, the bottleneck communication problem has arisen. When the number of membranes grows up, the network gets congested. The purpose of distributed architectures is to reach a compromise between the massively parallel character of the system and the needed evolution step time to transit from one configuration of the system to the next one, solving the bottleneck communication problem. The goal of this paper is twofold. Firstly, to survey in a systematic and uniform way the main results regarding the way membranes can be placed on processors in order to get a software/hardware simulation of P-Systems in a distributed environment. Secondly, we improve some results about the membrane dissolution problem, prove that it is connected, and discuss the possibility of simulating this property in the distributed model. All this yields an improvement in the system parallelism implementation since it gets an increment of the parallelism of the external communication among processors. Proposed ideas improve previous architectures to tackle the communication bottleneck problem, such as reduction of the total time of an evolution step, increase of the number of membranes that could run on a processor and reduction of the number of processors

    A Fast Potential and Self-Gravity Solver for Non-Axisymmetric Disks

    Full text link
    Disk self-gravity could play an important role in the dynamic evolution of interaction between disks and embedded protoplanets. We have developed a fast and accurate solver to calculate the disk potential and disk self-gravity forces for disk systems on a uniform polar grid. Our method follows closely the method given by Chan et al. (2006), in which an FFT in the azimuthal direction is performed and a direct integral approach in the frequency domain in the radial direction is implemented on a uniform polar grid. This method can be very effective for disks with vertical structures that depend only on the disk radius, achieving the same computational efficiency as for zero-thickness disks. We describe how to parallelize the solver efficiently on distributed parallel computers. We propose a mode-cutoff procedure to reduce the parallel communication cost and achieve nearly linear scalability for a large number of processors. For comparison, we have also developed a particle-based fast tree-code to calculate the self-gravity of the disk system with vertical structure. The numerical results show that our direct integral method is at least two order of magnitudes faster than our optimized tree-code approach.Comment: 8 figures, accepted to ApJ
    corecore