3,686 research outputs found
Developing EfïŹcient Discrete Simulations on Multicore and GPU Architectures
In this paper we show how to efïŹciently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientiïŹc computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de EconomĂa, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de InvestigaciĂłn (AEI) of Spain, coïŹnanced by FEDER funds (EU) TIN2017-89842
Recommended from our members
Pseudorandom number generation with self programmable cellular automata
In this paper, we propose a new class of cellular automata â self programming cellular automata (SPCA) with specific application to pseudorandom number generation. By changing a cell's state transition rules in relation to factors such as its neighboring cell's states, behavioral complexity can be increased and utilized. Interplay between the state transition neighborhood and rule selection neighborhood leads to a new composite neighborhood and state transition rule that is the linear combination of two different mappings with different temporal dependencies. It is proved that when the transitional matrices for both the state transition and rule selection neighborhood are non-singular, SPCA will not exhibit non-group behavior. Good performance can be obtained using simple neighborhoods with certain CA length, transition rules etc. Certain configurations of SPCA pass all DIEHARD and ENT tests with an implementation cost lower than current reported work. Output sampling methods are also suggested to improve output efficiency by sampling the outputs of the new rule selection neighborhoods
Parallel Implementations of Cellular Automata for Traffic Models
The Biham-Middleton-Levine (BML) traffic model is a simple two-dimensional,
discrete Cellular Automaton (CA) that has been used to study self-organization
and phase transitions arising in traffic flows. From the computational point of
view, the BML model exhibits the usual features of discrete CA, where the state
of the automaton are updated according to simple rules that depend on the state
of each cell and its neighbors. In this paper we study the impact of various
optimizations for speeding up CA computations by using the BML model as a case
study. In particular, we describe and analyze the impact of several parallel
implementations that rely on CPU features, such as multiple cores or SIMD
instructions, and on GPUs. Experimental evaluation provides quantitative
measures of the payoff of each technique in terms of speedup with respect to a
plain serial implementation. Our findings show that the performance gap between
CPU and GPU implementations of the BML traffic model can be reduced by clever
exploitation of all CPU features
An investigation of the efficient implementation of Cellular Automata on multi-core CPU and GPU hardware
Copyright © 2015 Elsevier. NOTICE: this is the authorâs version of a work that was accepted for publication in Journal of Parallel and Distributed Computing . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Parallel and Distributed Computing Vol. 77 (2015), DOI: 10.1016/j.jpdc.2014.10.011Cellular automata (CA) have proven to be excellent tools for the simulation of a wide variety of phenomena in the natural world. They are ideal candidates for acceleration with modern general purpose-graphical processing units (GPU/GPGPU) hardware that consists of large numbers of small, tightly-coupled processors. In this study the potential for speeding up CA execution using multi-core CPUs and GPUs is investigated and the scalability of doing so with respect to standard CA parameters such as lattice and neighbourhood sizes, number of states and generations is determined. Additionally the impact of âActivityâ (the number of âaliveâ cells) within a given CA simulation is investigated in terms of both varying the random initial distribution levels of âaliveâ cells, and via the use of novel state transition rules; where a change in the dynamics of these rules (i.e. the number of states) allows for the investigation of the variable complexity within.Engineering and Physical Sciences Research Council (EPSRC
- âŠ