131 research outputs found

    Highly optimized simulations on single- and multi-GPU systems of 3D Ising spin glass

    Full text link
    We present a highly optimized implementation of a Monte Carlo (MC) simulator for the three-dimensional Ising spin-glass model with bimodal disorder, i.e., the 3D Edwards-Anderson model running on CUDA enabled GPUs. Multi-GPU systems exchange data by means of the Message Passing Interface (MPI). The chosen MC dynamics is the classic Metropolis one, which is purely dissipative, since the aim was the study of the critical off-equilibrium relaxation of the system. We focused on the following issues: i) the implementation of efficient access patterns for nearest neighbours in a cubic stencil and for lagged-Fibonacci-like pseudo-Random Numbers Generators (PRNGs); ii) a novel implementation of the asynchronous multispin-coding Metropolis MC step allowing to store one spin per bit and iii) a multi-GPU version based on a combination of MPI and CUDA streams. We highlight how cubic stencils and PRNGs are two subjects of very general interest because of their widespread use in many simulation codes. Our code best performances ~3 and ~5 psFlip on a GTX Titan with our implementations of the MINSTD and MT19937 respectively.Comment: 39 pages, 13 figure

    Methodology for the Accelerated Reliability Analysis and Prognosis of Underground Cables based on FPGA

    Get PDF
    Dependable electrical power distribution systems demand high reliability levels that cause increased maintenance costs to the utilities. Often, the extra costs are the result of unnecessary maintenance procedures, which can be avoided by monitoring the equipment and predicting the future system evolution by means of statistical methods (prognostics). The present thesis aims at designing accurate methods for predicting the degradation of high and medium voltage underground Cross-Linked Polyethylene (XLPE) cables within an electrical power distribution grid, and predicting their remaining useful life, in order inform maintenance procedures. However, electric power distribution grids are large, components interact with each other, and they degrade with time and use. Solving the statistics of the predictive models of the power grids currently requires long numerical simulations that demand large computational resources and long simulation times even when using advanced parallel architectures. Often, approximate models are used in order to reduce the simulation time and the required resources. In this context, Field Programmable Gate Arrays (FPGAs) can be employed to accelerate the simulation of these stochastic processes. However, the adaptation of the physicsbased degradation models of underground cables for FPGA simulation can be complex. Accordingly, this thesis proposes an FPGA-based framework for the on-line monitoring and prognosis of underground cables based on an electro-thermal degradation model that is adapted for its accelerated simulation in the programmable logic of an FPGA.Energia elektrikoaren banaketa-sare konfidagarriek fidagarritasun maila altuak eskatzen dituzte, eta honek beraien mantenketa kostuen igoera dakar. Kostu hauen arrazoia beraien bizitzan goizegi egiten diren mantenketa prozesuei dagokie askotan, eta hauek eragoztea posible da, ekipamenduaren monitorizazioa eginez eta sistemaren etorkizuneko eboluzioa aurrez estimatuz (prognosia). Tesi honen helburua lurpeko tentsio altu eta ertaineko Cross-Linked Polyethylene (XLPE) kable sistemen eboluzioa eta geratzen zaien bizitza aurreikusiko duten metodo egokiak definitzea izango da, banaketa-sare elektriko baten barruan, ondoren mantenketa prozesu optimo bat ahalbidetuko duena. Hala ere, sistema hauek oso jokaera dinamikoa daukate. Konponente ezberdinek beraien artean elkar eragiten dute eta degradatu egiten dira denboran eta erabileraren ondorioz. Estatistika hauen soluzio analitikoa lortzea ezinezkoa da gaur egun, eta errekurtso asko eskatzen dituen simulazio luzeak behar ditu zenbakizko erantzun bat lortzeko, arkitektura paralelo aurreratuak erabili arren. Field Programmable Gate Array (FPGA)k prozesu estokastiko hauen simulazioa azkartzeko erabil daitezke, baina lurpeko kableen degradazio prozesuen modelo fisikoak FPGA exekuziorako egokitzea konplexua izan daiteke. Beraz, tesi honek FPGA baten logika programagarrian azeleratu ahal izateko egokitua izan den degradazio elektrotermiko modelo baten oinarritutako monitorizazio eta prognosi metodologia bat proposatzen du.Las redes de distribución de energía eléctrica confiables requieren de altos niveles de fiabilidad, que causan un mayor coste de mantenimiento a las empresas distribuidoras. Frecuentemente los costes adicionales son el resultado de procedimientos de mantenimiento innecesarios, que se pueden evitar por medio de la monitorización de los equipos y la predicción de la evolución futura del sistema, por medio de métodos estadísticos (prognosis). La presente tesis pretende desarrollar métodos adecuados para la predicción de la degradación futura de cables de alta y media tensión Cross-Linked Polyethylene (XLPE) soterrados, dentro de una red de distribución eléctrica, y predecir su tiempo de vida restante, para definir una secuencia de mantenimiento óptima. Sin embargo, las redes de distribución eléctrica son grandes, y compuestas por componentes que interactúan entre sí y se degradan con el tiempo y el uso. En la actualidad, resolver estas estadísticas predictivas requieren grandes simulaciones numéricas que requieren de grandes recursos computacionales y largos tiempos de simulación, incluso utilizando arquitecturas paralelas avanzadas. Las Field Programmable Gate Array (FPGA) pueden ser utilizadas para acelerar las simulaciones de estos procesos estocásticos, pero la adaptación de los modelos físicos de degradación de cables soterrados para su simulación en una FPGA puede ser complejo. Así, esta tesis propone el desarrollo de una metodología de monitorización y prognosis cables soterrados, basado en un modelo de degradación electro-térmico que está adaptado para su simulación acelerada en la lógica programable de una FPGA

    Low power and high performance heterogeneous computing on FPGAs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP

    Full text link
    Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&D, as well as the conclusions and lessons learnt from the beta prototype.Comment: 34 pages, 26 figures, 24 table

    Evaluation of pseudo-random number generation on GPU cards

    Get PDF
    Monte Carlo methods rely on sequences of random numbers to obtain solutions to many problems in science and engineering. In this work, we evaluate the performance of different pseudo-random number generators (PRNGs) of the Curand library on a number of modern Nvidia GPU cards. As a numerical test, we generate pseudo-random number (PRN) sequences and obtain non-uniform distributions using the acceptance-rejection method. We consider GPU, CPU, and hybrid CPU/GPU implementations. For the GPU, we additionally consider two different implementations using the host and device application programming interfaces (API). We study how the performance depends on implementation parameters, including the number of threads per block and the number of blocks per streaming multiprocessor. To achieve the fastest performance, one has to minimize the time consumed by PRNG seed setup and state update. The duration of seed setup time increases with the number of threads, while PRNG state update decreases. Hence, the fastest performance is achieved by the optimal balance of these opposing effects

    A Scalable Framework for Monte Carlo Simulation Using FPGA-based Hardware Accelerators with Application to SPECT Imaging A SCALABLE FRAMEWORK FOR MONTE CARLO SIMULATION USING FPGA-BASED HARDWARE ACCELERATORS WITH APPLICATION TO SPECT IMAGING TITLE: A Scal

    Get PDF
    Abstract As the number of transistors that are integrated onto a silicon die continues to increase, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simula- Processor. Futhermore, we have created a framework for further increasing parallelism by scaling our architecture across multiple compute devices and by extending our original design to a multi-FPGA system nearly linear increase in acceleration with logic resources was achieved. iv Acknowledgements One could hardly put into words the contributions made to this work by the many wonderful people who surround me on a daily basis. I count myself blessed to have family, friends and colleagues that support and encourage me and to recognize each individually would be impossible. Nonetheless, there are some people without whose explicit mention this thesis would be incomplete

    Efficient treatment and quantification of uncertainty in probabilistic seismic hazard and risk analysis

    Get PDF
    The main goals of this thesis are the development of a computationally efficient framework for stochastic treatment of various important uncertainties in probabilistic seismic hazard and risk assessment, its application to a newly created seismic risk model of Indonesia, and the analysis and quantification of the impact of these uncertainties on the distribution of estimated seismic losses for a large number of synthetic portfolios modeled after real-world counterparts. The treatment and quantification of uncertainty in probabilistic seismic hazard and risk analysis has already been identified as an area that could benefit from increased research attention. Furthermore, it has become evident that the lack of research considering the development and application of suitable sampling schemes to increase the computational efficiency of the stochastic simulation represents a bottleneck for applications where model runtime is an important factor. In this research study, the development and state of the art of probabilistic seismic hazard and risk analysis is first reviewed and opportunities for improved treatment of uncertainties are identified. A newly developed framework for the stochastic treatment of portfolio location uncertainty as well as ground motion and damage uncertainty is presented. The framework is then optimized with respect to computational efficiency. Amongst other techniques, a novel variance reduction scheme for portfolio location uncertainty is developed. Furthermore, in this thesis, some well-known variance reduction schemes such as Quasi Monte Carlo, Latin Hypercube Sampling and MISER (locally adaptive recursive stratified sampling) are applied for the first time to seismic hazard and risk assessment. The effectiveness and applicability of all used schemes is analyzed. Several chapters of this monograph describe the theory, implementation and some exemplary applications of the framework. To conduct these exemplary applications, a seismic hazard model for Indonesia was developed and used for the analysis and quantification of loss uncertainty for a large collection of synthetic portfolios. As part of this work, the new framework was integrated into a probabilistic seismic hazard and risk assessment software suite developed and used by Munich Reinsurance Group. Furthermore, those parts of the framework that deal with location and damage uncertainties are also used by the flood and storm natural catastrophe model development groups at Munich Reinsurance for their risk models

    The use of primitives in the calculation of radiative view factors

    Get PDF
    Compilations of radiative view factors (often in closed analytical form) are readily available in the open literature for commonly encountered geometries. For more complex three-dimensional (3D) scenarios, however, the effort required to solve the requisite multi-dimensional integrations needed to estimate a required view factor can be daunting to say the least. In such cases, a combination of finite element methods (where the geometry in question is sub-divided into a large number of uniform, often triangular, elements) and Monte Carlo Ray Tracing (MC-RT) has been developed, although frequently the software implementation is suitable only for a limited set of geometrical scenarios. Driven initially by a need to calculate the radiative heat transfer occurring within an operational fibre-drawing furnace, this research set out to examine options whereby MC-RT could be used to cost-effectively calculate any generic 3D radiative view factor using current vectorisation technologies

    Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

    Full text link
    We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels, while performing 20100×20-100\times faster than the vectorized-map (\texttt{vmap}) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured, supporting event handling, automatic differentiation, and incorporating of datasets via the GPU's texture memory, allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance.Comment: 11 figure
    corecore