3,686 research outputs found

    Developing EfïŹcient Discrete Simulations on Multicore and GPU Architectures

    Get PDF
    In this paper we show how to efïŹciently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientiïŹc computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de EconomĂ­a, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de InvestigaciĂłn (AEI) of Spain, coïŹnanced by FEDER funds (EU) TIN2017-89842

    Parallel Implementations of Cellular Automata for Traffic Models

    Full text link
    The Biham-Middleton-Levine (BML) traffic model is a simple two-dimensional, discrete Cellular Automaton (CA) that has been used to study self-organization and phase transitions arising in traffic flows. From the computational point of view, the BML model exhibits the usual features of discrete CA, where the state of the automaton are updated according to simple rules that depend on the state of each cell and its neighbors. In this paper we study the impact of various optimizations for speeding up CA computations by using the BML model as a case study. In particular, we describe and analyze the impact of several parallel implementations that rely on CPU features, such as multiple cores or SIMD instructions, and on GPUs. Experimental evaluation provides quantitative measures of the payoff of each technique in terms of speedup with respect to a plain serial implementation. Our findings show that the performance gap between CPU and GPU implementations of the BML traffic model can be reduced by clever exploitation of all CPU features

    An investigation of the efficient implementation of Cellular Automata on multi-core CPU and GPU hardware

    Get PDF
    Copyright © 2015 Elsevier. NOTICE: this is the author’s version of a work that was accepted for publication in Journal of Parallel and Distributed Computing . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Parallel and Distributed Computing Vol. 77 (2015), DOI: 10.1016/j.jpdc.2014.10.011Cellular automata (CA) have proven to be excellent tools for the simulation of a wide variety of phenomena in the natural world. They are ideal candidates for acceleration with modern general purpose-graphical processing units (GPU/GPGPU) hardware that consists of large numbers of small, tightly-coupled processors. In this study the potential for speeding up CA execution using multi-core CPUs and GPUs is investigated and the scalability of doing so with respect to standard CA parameters such as lattice and neighbourhood sizes, number of states and generations is determined. Additionally the impact of ‘Activity’ (the number of ‘alive’ cells) within a given CA simulation is investigated in terms of both varying the random initial distribution levels of ‘alive’ cells, and via the use of novel state transition rules; where a change in the dynamics of these rules (i.e. the number of states) allows for the investigation of the variable complexity within.Engineering and Physical Sciences Research Council (EPSRC
    • 

    corecore