4,102 research outputs found

    Developing Efficient Discrete Simulations on Multicore and GPU Architectures

    Get PDF
    In this paper we show how to efficiently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU) TIN2017-89842

    Simulation of reaction-diffusion processes in three dimensions using CUDA

    Get PDF
    Numerical solution of reaction-diffusion equations in three dimensions is one of the most challenging applied mathematical problems. Since these simulations are very time consuming, any ideas and strategies aiming at the reduction of CPU time are important topics of research. A general and robust idea is the parallelization of source codes/programs. Recently, the technological development of graphics hardware created a possibility to use desktop video cards to solve numerically intensive problems. We present a powerful parallel computing framework to solve reaction-diffusion equations numerically using the Graphics Processing Units (GPUs) with CUDA. Four different reaction-diffusion problems, (i) diffusion of chemically inert compound, (ii) Turing pattern formation, (iii) phase separation in the wake of a moving diffusion front and (iv) air pollution dispersion were solved, and additionally both the Shared method and the Moving Tiles method were tested. Our results show that parallel implementation achieves typical acceleration values in the order of 5-40 times compared to CPU using a single-threaded implementation on a 2.8 GHz desktop computer.Comment: 8 figures, 5 table

    Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

    Full text link
    In this report we present a novel approach to model coupling for shared-memory multicore systems hosting OpenCL-compliant accelerators, which we call The Glasgow Model Coupling Framework (GMCF). We discuss the implementation of a prototype of GMCF and its application to coupling the Weather Research and Forecasting Model and an OpenCL-accelerated version of the Large Eddy Simulator for Urban Flows (LES) developed at DPRI. The first stage of this work concerned the OpenCL port of the LES. The methodology used for the OpenCL port is a combination of automated analysis and code generation and rule-based manual parallelization. For the evaluation, the non-OpenCL LES code was compiled using gfortran, fort and pgfortran}, in each case with auto-parallelization and auto-vectorization. The OpenCL-accelerated version of the LES achieves a 7 times speed-up on a NVIDIA GeForce GTX 480 GPGPU, compared to the fastest possible compilation of the original code running on a 12-core Intel Xeon E5-2640. In the second stage of this work, we built the Glasgow Model Coupling Framework and successfully used it to couple an OpenMP-parallelized WRF instance with an OpenCL LES instance which runs the LES code on the GPGPI. The system requires only very minimal changes to the original code. The report discusses the rationale, aims, approach and implementation details of this work.Comment: This work was conducted during a research visit at the Disaster Prevention Research Institute of Kyoto University, supported by an EPSRC Overseas Travel Grant, EP/L026201/

    GPU accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian Movement

    Full text link
    Pedestrian movement, although ubiquitous and well-studied, is still not that well understood due to the complicating nature of the embedded social dynamics. Interest among researchers in simulating pedestrian movement and interactions has grown significantly in part due to increased computational and visualization capabilities afforded by high power computing. Different approaches have been adopted to simulate pedestrian movement under various circumstances and interactions. In the present work, bi-directional crowd movement is simulated where an equal numbers of individuals try to reach the opposite sides of an environment. Two movement methods are considered. First a Least Effort Model (LEM) is investigated where agents try to take an optimal path with as minimal changes from their intended path as possible. Following this, a modified form of Ant Colony Optimization (ACO) is proposed, where individuals are guided by a goal of reaching the other side in a least effort mode as well as a pheromone trail left by predecessors. The basic idea is to increase agent interaction, thereby more closely reflecting a real world scenario. The methodology utilizes Graphics Processing Units (GPUs) for general purpose computing using the CUDA platform. Because of the inherent parallel properties associated with pedestrian movement such as proximate interactions of individuals on a 2D grid, GPUs are well suited. The main feature of the implementation undertaken here is that the parallelism is data driven. The data driven implementation leads to a speedup up to 18x compared to its sequential counterpart running on a single threaded CPU. The numbers of pedestrians considered in the model ranged from 2K to 100K representing numbers typical of mass gathering events. A detailed discussion addresses implementation challenges faced and averted

    Towards aeraulic simulations at urban scale using the lattice Boltzmann method

    No full text
    International audienceThe lattice Boltzmann method (LBM) is an innovative approach in computational fluid dynamics (CFD). Due to the underlying lattice structure, the LBM is inherently parallel and therefore well suited for high performance computing. Its application to outdoor aeraulic studies is promising, e.g. applied on complex urban configurations, as an alternative approach to the commonplace Reynolds-averaged Navier-Stokes and large eddy simulation methods based on the Navier-Stokes equations. Emerging many-core devices, such as graphic processing units (GPUs), nowadays make possible to run very large scale simulations on rather inexpensive hardware. In this paper, we present simulation results obtained using our multi-GPU LBM solver. For validation purpose, we study the flow around a wall-mounted cube and show agreement with previously published experimental results. Furthermore, we discuss larger scale flow simulations involving nine cubes which demonstrate the practicability of CFD simulations in building external aeraulics

    Parallel Multi-Hypothesis Algorithm for Criticality Estimation in Traffic and Collision Avoidance

    Full text link
    Due to the current developments towards autonomous driving and vehicle active safety, there is an increasing necessity for algorithms that are able to perform complex criticality predictions in real-time. Being able to process multi-object traffic scenarios aids the implementation of a variety of automotive applications such as driver assistance systems for collision prevention and mitigation as well as fall-back systems for autonomous vehicles. We present a fully model-based algorithm with a parallelizable architecture. The proposed algorithm can evaluate the criticality of complex, multi-modal (vehicles and pedestrians) traffic scenarios by simulating millions of trajectory combinations and detecting collisions between objects. The algorithm is able to estimate upcoming criticality at very early stages, demonstrating its potential for vehicle safety-systems and autonomous driving applications. An implementation on an embedded system in a test vehicle proves in a prototypical manner the compatibility of the algorithm with the hardware possibilities of modern cars. For a complex traffic scenario with 11 dynamic objects, more than 86 million pose combinations are evaluated in 21 ms on the GPU of a Drive PX~2

    Thermodynamic Conditions in Quenching Chamber of Low Voltage Circuit Breaker

    Get PDF
    Práce se zabývá studiem procesů probíhajících při zhášení silnoproudého oblouku ve zhášecí komoře jističe. Je zaměřena na výpočet dynamiky tekutin a teplotního pole v okolí elektrického oblouku. V práci je dále popsán vliv vzdálenosti plechů v komoře a vliv tvarů plechů z hlediska aerodynamických podmínek uvnitř komory. Dalším cílem dosaženým touto prací je poskytnutí informací o vlivu polohy elektrického oblouku na termodynamické vlastnosti uvnitř komory. Toto je důležité, zejména pokud je oblouk do komory vtahován jinými silami, např. elektromagnetickými a během tohoto vtahovacího procesu mění svůj tvar i polohu. Za účelem co nejjednoduššího, ale zároveň co nejefektivnějšího řešení úkolu, byl vyvinut software určen speciálně pro výpočet dynamiky tekutin numerickou metodou konečných objemů (FVM). Tato metoda je, v porovnání s rozšířenější metodou konečných prvků (FEM), vhodnější pro výpočet dynamiky tekutin (CFD) zejména proto, že režie na výpočet jedné iterace jsou menší v porovnání s ostatními numerickými metodami. Další výhodou tohoto softwarového řešení je jeho modularita a rozšiřitelnost. Cely koncept softwaru je postaven na tzv. zásuvných modulech. Díky tomuto řešení můžeme využít výpočtové jádro pro další numerické analýzy, např. strukturální, elektromagnetickou apod. Jediná potřeba pro úspěšné používání těchto analýz je napsáni solveru pro konečné prvky (FEM). Jelikož je software koncipován jako multi–thread aplikace, využívá výkon současných vícejádrových procesorů naplno. Tato vlastnost se ještě více projeví, pokud se výpočet přesune z CPU na GPU. Jelikož současné grafické karty vyšších tříd mají několik desítek až stovek výpočetních jader a pracují s mnohem rychlejšími pamětmi, než CPU, je výpočetní výkon několikanásobně vyšší.Work deals with the study of processes that attend the electric arc extinction inside the quenching chamber of a circuit breaker. It is focused on several areas. The first one is concerned to fluid dynamics calculations (CFD) and the second one is aimed at thermal field calculations. In this work effects of metal plates distance together with metal plates shapes are described from aerodynamical point of view. Another objective solved by this work is to give information about influence of an electric arc position in a quenching chamber, which changed its shape due to forces acting on it during extinction process. For purpose of this work a new software solution for CFD was developed. Whole software concept is based on plug-ins. Due to this solution, the software§s calculation core can be used for other numerical analyses, like structural, electromagnetic, etc. The only requirement is to write a plug-in for these analyses. Because the software is designed as multi-threaded application, it can use the fully performance of current multi-core processors. Above mentioned property can be especially shown off, when a calculation is moved from CPU to GPU (Graphics Processing Units). Current high-end graphic cards have tens to hundreds cores and work with faster memories than CPU. Due to this fact, the simulation performance can raised manifold.

    Rapid-Response Urban CFD Simulations Using a GPU Computing Paradigm on Desktop Supercomputers

    Get PDF
    In the event of chemical or biological (CB) agent attacks or accidents, first-responders need hazard prediction data to launch effective emergency response action. Accurate and timely knowledge of the wind fields in urban areas is critically important to identify and project the extent of CB agent dispersion to determine the hazard-zone. In their 2008 report (GAO-08-180), U.S. Government Accountability Office has reported that first responders are limited in their ability to detect and model hazardous releases in urban environments. The current set of modeling tools for contaminant dispersion in urban environments rely on empirical assumptions with diagnostic equations (Wang et al. 2003, Williams et al. 2004). The main advantage of these models is their relatively fast turn-around times, although their predictive capabilities can be limited. As part of the Joint Effects Model (JEM), funded by the Department of Defense, urban transport and dispersion models have been evaluated for their rapid-response capabilities. As discussed in Heagy et al. (2007), majority of the urban transport and dispersion models considered in the evaluation study fell short of satisfying the JEM key performance parameter of maximum 10-minutes run-time on a desktop computer, and the models that were able to satisfy the performance parameter were employed at low resolutions

    Visual Simulation of Flow

    Get PDF
    We have adopted a numerical method from computational fluid dynamics, the Lattice Boltzmann Method (LBM), for real-time simulation and visualization of flow and amorphous phenomena, such as clouds, smoke, fire, haze, dust, radioactive plumes, and air-borne biological or chemical agents. Unlike other approaches, LBM discretizes the micro-physics of local interactions and can handle very complex boundary conditions, such as deep urban canyons, curved walls, indoors, and dynamic boundaries of moving objects. Due to its discrete nature, LBM lends itself to multi-resolution approaches, and its computational pattern, which is similar to cellular automata, is easily parallelizable. We have accelerated LBM on commodity graphics processing units (GPUs), achieving real-time or even accelerated real-time on a single GPU or on a GPU cluster. We have implemented a 3D urban navigation system and applied it in New York City with real-time live sensor data. In addition to a pivotal application in simulation of airborne contaminants in urban environments, this approach will enable the development of other superior prediction simulation capabilities, computer graphics and games, and a novel technology for computational science and engineering
    corecore