Search CORE

7,107 research outputs found

Improving Simulations of Spiking Neural P Systems in NVIDIA CUDA GPUs: CuSNP

Author: Adorna Henry N.
Cabarle Francis George C.
Carandang Jym Paul
Martínez del Amor Miguel Ángel
Villaflores John Matthew B.
Publication venue: 'Fenix - Revista de Historia e Estudos Culturais'
Publication date: 01/01/2016
Field of study

Spiking neural P systems (in short, SN P systems) are parallel models of computations inspired by the spiking ( ring) of biological neurons. In SN P systems, neurons function as spike processors and are placed on nodes of a directed graph. Synapses, the connections between neurons, are represented by arcs or directed endges in the graph. Not only do SN P systems have parallel semantics (i.e. neurons operate in parallel), but their structure as directed graphs allow them to be represented as vectors or matrices. Such representations allow the use of linear algebra operations for simulating the evolution of the system con gurations, i.e. computations. In this work, we continue the implementations of SN P systems with delays, i.e. a delay is associated with the sending of a spike from a neuron to its neighbouring neurons. Our implementation is based on a modi ed representation of SN P systems as vectors and matrices for SN P systems without delays. We us massively parallel processors known as graphics processing units (in short, GPUs) from NVIDIA. For experimental validation, we use SN P systems implementing generalized sorting networks. We report a speedup, i.e. the ratio between the running time of the sequential over the parallel simulator, of up to approximately 51 times for a 512-size input to the sorting network

idUS. Depósito de Investigación Universidad de Sevilla

Learning Parallel Computations with ParaLab

Author: Kozinov E.
Shtanyuk A.
Publication venue: Уральский федеральный университет
Publication date: 01/01/2015
Field of study

In this paper, we present the ParaLab teachware system, which can be used for learning the parallel computation methods. ParaLab provides the tools for simulating the multiprocessor computational systems with various network topologies, for carrying out the computational experiments in the simulation mode, and for evaluating the efficiency of the parallel computation methods. The visual presentation of the parallel computations taking place in the computational experiments is the key feature of the system. ParaLab can be used for the laboratory training within various teaching courses in the field of parallel, distributed, and supercomputer computations

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

A Lower Bound Technique for Communication in BSP

Author: Bilardi Gianfranco
Scquizzato Michele
Silvestri Francesco
Publication venue
Publication date: 25/11/2017
Field of study

Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Online Permutation Routing in Partitioned Optical Passive Star Networks

Author: Mei Alessandro
Rizzi Romeo
Publication venue
Publication date: 25/02/2005
Field of study

This paper establishes the state of the art in both deterministic and randomized online permutation routing in the POPS network. Indeed, we show that any permutation can be routed online on a POPS network either with

O(\frac{d}{g}\log g)

deterministic slots, or, with high probability, with

5c\lceil d/g\rceil+o(d/g)+O(\log\log g)

randomized slots, where constant

c=\exp (1+e^{-1})\approx 3.927

. When

d=\Theta(g)

, that we claim to be the "interesting" case, the randomized algorithm is exponentially faster than any other algorithm in the literature, both deterministic and randomized ones. This is true in practice as well. Indeed, experiments show that it outperforms its rivals even starting from as small a network as a POPS(2,2), and the gap grows exponentially with the size of the network. We can also show that, under proper hypothesis, no deterministic algorithm can asymptotically match its performance

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Archivio della ricerca- Università di Roma La Sapienza

NBODY6++GPU: Ready for the gravitational million-body problem

Author: Aarseth Sverre
Berczik Peter
Kouwenhoven M. B. N.
Naab Thorsten
Nitadori Keigo
Spurzem Rainer
Wang Long
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Accurate direct

N

-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a well-known direct

N

-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++GPU, an optimized version of NBODY6++ with hybrid parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large direct

N

-body simulations, and in particular to solve the million-body problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as well as the first results from a simulation of a realistic globular cluster initially containing a million particles. For million-body simulations, NBODY6++GPU is

400-2000

times faster than NBODY6 with 320 CPU cores and 32 NVIDIA K20X GPUs. With this computing cluster specification, the simulations of million-body globular clusters including

5\%

primordial binaries require about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table

arXiv.org e-Print Archive

Crossref

Repository of the Academy's Library

MPG.PuRe