1,716 research outputs found

    Developing Efficient Discrete Simulations on Multicore and GPU Architectures

    Get PDF
    In this paper we show how to efficiently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU) TIN2017-89842

    Parallel Cellular Automata-based Simulation of Laser Dynamics using Dynamic Load Balancing

    Get PDF
    We present an analysis of the feasibility of executing a parallel bioinspired model of laser dynamics, based on cellular automata (CA), on the usual target platform of this kind of applications: a heterogeneous non-dedicated cluster. As this model employs a synchronous CA, using the single program, multiple data (SPMD) paradigm, it is not clear in advance if an appropriate efficiency can be obtained on this kind of platform. We have evaluated its performance including artificial load to simulate other tasks or jobs submitted by other users. A dynamic load balancing strategy with two main differences from most previous implementations of CA based models has been used. First, it is possible to migrate load to cluster nodes initially not belonging to the pool. Second, a modular approach is taken in which the model is executed on top of a dynamic load balancing tool – the Dynamite system – gaining flexibility. Very satisfactory results have been obtained, with performance increases from 60% to 80%.Ministerio de Ciencia e Innovación TIN2007-68083-C02Junta de Extremadura PRI06A22

    GPU-based cellular automata simulations of laser dynamics

    Get PDF
    We present a parallel implementation for Graphics Processing Units (GPUs) of a model based on cellular automata (CA) to simulate laser dynamics. A cellular automaton is an inherent parallel type of algorithm that is very suitable to simulate complex systems formed by many individual components which give rise to emergent behaviours. We exploit the parallel character of this kind of algorithms to develop a fine-grained parallel implementation of the CA laser model on GPUs. A good speedup of up to 14.5 over a sequential implementation running on a single core CPU has been obtained, showing the feasibility of this model to run efficient parallel simulations on GPUs

    An investigation of the efficient implementation of Cellular Automata on multi-core CPU and GPU hardware

    Get PDF
    Copyright © 2015 Elsevier. NOTICE: this is the author’s version of a work that was accepted for publication in Journal of Parallel and Distributed Computing . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Parallel and Distributed Computing Vol. 77 (2015), DOI: 10.1016/j.jpdc.2014.10.011Cellular automata (CA) have proven to be excellent tools for the simulation of a wide variety of phenomena in the natural world. They are ideal candidates for acceleration with modern general purpose-graphical processing units (GPU/GPGPU) hardware that consists of large numbers of small, tightly-coupled processors. In this study the potential for speeding up CA execution using multi-core CPUs and GPUs is investigated and the scalability of doing so with respect to standard CA parameters such as lattice and neighbourhood sizes, number of states and generations is determined. Additionally the impact of ‘Activity’ (the number of ‘alive’ cells) within a given CA simulation is investigated in terms of both varying the random initial distribution levels of ‘alive’ cells, and via the use of novel state transition rules; where a change in the dynamics of these rules (i.e. the number of states) allows for the investigation of the variable complexity within.Engineering and Physical Sciences Research Council (EPSRC

    Cellular automata and cluster computing: An application to the simulation of laser dynamics

    Get PDF
    Firstly, the application of a cellular automata (CA) model to simulate the dynamics of lasers is reviewed. With this kind of model, the macroscopic properties of the laser system emerge as a cooperative phenomenon from elementary components locally inter-acting under simple rules. Secondly, a parallel implementation of this kind of model for distributed-memory parallel computers is presented. Performance and scalability of this parallel implementation running on a computer cluster are analyzed, giving very satisfac-tory results. This confirms the feasibility of running large 3D simulations— unaffordable on an individual machine—on computer clusters, in order to simulate specific real laser systems.Ministerio de Educación y Ciencia TIN2005-08818-C04-0

    The emergence of biofilms:Computational and experimental studies

    Get PDF
    The response of biofilms to any external stimuli is a cumulative response aggregated from individual bacteria residing within the biofilm. The organizational complexity of biofilm can be studied effectively by understanding bacterial interactions at cell level. The overall aim of the thesis is to explore the complex evolutionary behaviour of bacterial biofilms. This thesis is divided into three major studies based on the type of perturbation analysed in the study. The first study analyses the physics behind the development of mushroom-shaped structures from the influence of nutrient cues in biofilms. Glazier-Graner-Hogeweg model is used to simulate the cell characteristics. From the study, it is observed that chemotaxis of bacterial cells towards nutrient source is one of the major precursors for formation of mushroom-shaped structures. The objective of the second study is to analyse the impact of environmental conditions on the inter-biofilm quorum sensing (QS) signalling. Using a hybrid convection-diffusion-reaction model, the simulations predict the diffusivity of QS molecules, the spatiotemporal variations of QS signal concentrations and the competition outcome between QS and quorum quenching mutant bacterial communities. The mechanical effects associated with the fluid-biofilm interaction is addressed in the third study. A novel fluid-structure interaction model based on fluid dynamics and structural energy minimization is developed in the study. Model simulations are used to analyse the detachment and surface effects of the fluid stresses on the biofilm. In addition to the mechanistic models described, a separate study is carried out to estimate the computational efficiency of the biofilm simulation models

    Manufacturing Systems Line Balancing using Max-Plus Algebra

    Get PDF
    In today\u27s dynamic environment, particularly the manufacturing sector, the necessity of being agile, and flexible is far greater than before. Decision makers should be equipped with effective tools, methods, and information to respond to the market\u27s rapid changes. Modelling a manufacturing system provides unique insight into its behavior and allows simulating all crucial elements that have a role in the system performance. Max-Plus Algebra is a mathematical tool that can model a Discrete Event Dynamic System in the form of linear equations. Whereas Max-Plus Algebra was introduced after the 1980s, the number of studies regarding this tool and its applications is fewer than regarding Petri Nets, Automata, Markov process, Discrete Even Simulation and Queuing models. Consequently, Max-Plus Algebra needs to be applied and tested in many systems in order to explore hidden aspects of its function and capabilities. To work effectively; the production/assembly line should be balanced. Line balancing is one of the manufacturing functions that tries to divide work equally across the production flow. Car Headlight Manufacturing Line as a Discrete Manufacturing System is considered which is a combination of manufacturing and assembly lines composed of different stations. Seven system scenarios were modeled and analyzed using Max-Plus to balance the car headlights production line. Key Performance Indicators (KPIs) are used to compare the various scenarios including Cycle Time, Average Deliver Rate, Total Processing Lead Time, Stations\u27 Utilization Rate, Idle Time, Efficiency, and Financial Analysis. FlexSim simulation software is used to validate the Max-Plus models results and its advantages and drawbacks compared with Max-Plus Algebra. This study is a unique application of Max-Plus Algebra in line balancing of a manufacturing system. Moreover, the problem size of the considered model is at least twice (12 stations) that of previous studies. In the matter of complexity, seven different scenarios are developed through the combination of parallel stations and buffers. Due to that the last scenario is included four parallel stations plus two buffers Based on the findings, the superiority of scenario 7 compared to other scenarios is proved due to its lowest system delivering first output time (14 seconds), best average delivery rate (24.5 seconds), shortest cycle time (736 seconds), shortest total processing lead time (11,534 seconds), least percentage of idle time (12%), lowest unit cost ($6.9), and highest efficiency (88%). However, Scenario 4 has the best utilization rate at 75%
    corecore