131 research outputs found
Highly optimized simulations on single- and multi-GPU systems of 3D Ising spin glass
We present a highly optimized implementation of a Monte Carlo (MC) simulator
for the three-dimensional Ising spin-glass model with bimodal disorder, i.e.,
the 3D Edwards-Anderson model running on CUDA enabled GPUs. Multi-GPU systems
exchange data by means of the Message Passing Interface (MPI). The chosen MC
dynamics is the classic Metropolis one, which is purely dissipative, since the
aim was the study of the critical off-equilibrium relaxation of the system. We
focused on the following issues: i) the implementation of efficient access
patterns for nearest neighbours in a cubic stencil and for
lagged-Fibonacci-like pseudo-Random Numbers Generators (PRNGs); ii) a novel
implementation of the asynchronous multispin-coding Metropolis MC step allowing
to store one spin per bit and iii) a multi-GPU version based on a combination
of MPI and CUDA streams. We highlight how cubic stencils and PRNGs are two
subjects of very general interest because of their widespread use in many
simulation codes. Our code best performances ~3 and ~5 psFlip on a GTX Titan
with our implementations of the MINSTD and MT19937 respectively.Comment: 39 pages, 13 figure
Methodology for the Accelerated Reliability Analysis and Prognosis of Underground Cables based on FPGA
Dependable electrical power distribution systems demand high reliability levels that cause increased maintenance costs to the utilities. Often, the extra costs are the result of unnecessary maintenance procedures, which can be avoided by monitoring the equipment and predicting the future system evolution by means of statistical methods (prognostics). The present thesis aims at designing accurate methods for predicting the degradation of high and medium voltage underground Cross-Linked Polyethylene (XLPE) cables within an electrical power distribution grid, and predicting their remaining useful life, in order inform maintenance procedures. However, electric power distribution grids are large, components interact with each other, and they degrade with time and use. Solving the statistics of the predictive models of the power grids currently requires long numerical simulations that demand large computational resources and long simulation times even when using advanced parallel architectures. Often, approximate models are used in order to reduce the simulation time and the required resources. In this context, Field Programmable Gate Arrays (FPGAs) can be employed to accelerate the simulation of these stochastic processes. However, the adaptation of the physicsbased degradation models of underground cables for FPGA simulation can be complex. Accordingly, this thesis proposes an FPGA-based framework for the on-line monitoring and prognosis of underground cables based on an electro-thermal degradation model that is adapted for its accelerated simulation in the programmable logic of an FPGA.Energia elektrikoaren banaketa-sare konfidagarriek fidagarritasun maila altuak eskatzen dituzte, eta honek beraien mantenketa kostuen igoera dakar. Kostu hauen arrazoia beraien bizitzan goizegi egiten diren mantenketa prozesuei dagokie askotan, eta hauek eragoztea posible da, ekipamenduaren monitorizazioa eginez eta sistemaren etorkizuneko eboluzioa aurrez estimatuz (prognosia). Tesi honen helburua lurpeko tentsio altu eta ertaineko Cross-Linked Polyethylene (XLPE) kable sistemen eboluzioa eta geratzen zaien bizitza aurreikusiko duten metodo egokiak definitzea izango da, banaketa-sare elektriko baten barruan, ondoren mantenketa prozesu optimo bat ahalbidetuko duena. Hala ere, sistema hauek oso jokaera dinamikoa daukate. Konponente ezberdinek beraien artean elkar eragiten dute eta degradatu egiten dira denboran eta erabileraren ondorioz. Estatistika hauen soluzio analitikoa lortzea ezinezkoa da gaur egun, eta errekurtso
asko eskatzen dituen simulazio luzeak behar ditu zenbakizko erantzun bat lortzeko, arkitektura paralelo aurreratuak erabili arren. Field Programmable Gate Array (FPGA)k prozesu estokastiko hauen simulazioa azkartzeko erabil daitezke, baina lurpeko kableen degradazio prozesuen modelo fisikoak FPGA exekuziorako egokitzea konplexua izan daiteke. Beraz, tesi honek FPGA baten logika programagarrian azeleratu ahal izateko egokitua izan den degradazio elektrotermiko modelo baten oinarritutako monitorizazio eta prognosi metodologia bat proposatzen du.Las redes de distribución de energía eléctrica confiables requieren de altos niveles de fiabilidad, que causan un mayor coste de mantenimiento a las empresas distribuidoras. Frecuentemente los costes adicionales son el resultado de procedimientos de mantenimiento innecesarios, que se pueden evitar por medio de la monitorización de los equipos y la predicción de la evolución futura del sistema, por medio de métodos estadísticos (prognosis). La presente tesis pretende desarrollar métodos adecuados para la predicción de la degradación futura de cables de alta y media tensión Cross-Linked Polyethylene (XLPE) soterrados, dentro de una red de distribución eléctrica, y predecir su tiempo de vida restante, para definir una secuencia de mantenimiento óptima. Sin embargo, las redes de distribución eléctrica son grandes, y compuestas por componentes que interactúan entre sí y se degradan con el tiempo y el uso. En la actualidad, resolver estas estadísticas predictivas requieren grandes simulaciones numéricas que requieren de grandes recursos computacionales y largos tiempos de simulación, incluso utilizando arquitecturas paralelas avanzadas. Las Field Programmable Gate Array (FPGA) pueden ser utilizadas para acelerar las simulaciones de estos procesos estocásticos, pero la adaptación de los modelos físicos de degradación de cables soterrados para su simulación en una FPGA puede ser complejo. Así, esta tesis propone el desarrollo de una metodología de monitorización y prognosis cables soterrados, basado en un modelo de degradación electro-térmico que está adaptado para su simulación acelerada en la lógica programable de una FPGA
Low power and high performance heterogeneous computing on FPGAs
L'abstract è presente nell'allegato / the abstract is in the attachmen
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Full detector simulation was among the largest CPU consumer in all CERN
experiment software stacks for the first two runs of the Large Hadron Collider
(LHC). In the early 2010's, the projections were that simulation demands would
scale linearly with luminosity increase, compensated only partially by an
increase of computing resources. The extension of fast simulation approaches to
more use cases, covering a larger fraction of the simulation budget, is only
part of the solution due to intrinsic precision limitations. The remainder
corresponds to speeding-up the simulation software by several factors, which is
out of reach using simple optimizations on the current code base. In this
context, the GeantV R&D project was launched, aiming to redesign the legacy
particle transport codes in order to make them benefit from fine-grained
parallelism features such as vectorization, but also from increased code and
data locality. This paper presents extensively the results and achievements of
this R&D, as well as the conclusions and lessons learnt from the beta
prototype.Comment: 34 pages, 26 figures, 24 table
Evaluation of pseudo-random number generation on GPU cards
Monte Carlo methods rely on sequences of random numbers to obtain solutions to many problems in science and engineering. In this work, we evaluate the performance of different pseudo-random number generators (PRNGs) of the Curand library on a number of modern Nvidia GPU cards. As a numerical test, we generate pseudo-random number (PRN) sequences and obtain non-uniform distributions using the acceptance-rejection method. We consider GPU, CPU, and hybrid CPU/GPU implementations. For the GPU, we additionally consider two different implementations using the host and device application programming interfaces (API). We study how the performance depends on implementation parameters, including the number of threads per block and the number of blocks per streaming multiprocessor. To achieve the fastest performance, one has to minimize the time consumed by PRNG seed setup and state update. The duration of seed setup time increases with the number of threads, while PRNG state update decreases. Hence, the fastest performance is achieved by the optimal balance of these opposing effects
A Scalable Framework for Monte Carlo Simulation Using FPGA-based Hardware Accelerators with Application to SPECT Imaging A SCALABLE FRAMEWORK FOR MONTE CARLO SIMULATION USING FPGA-BASED HARDWARE ACCELERATORS WITH APPLICATION TO SPECT IMAGING TITLE: A Scal
Abstract As the number of transistors that are integrated onto a silicon die continues to increase, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simula- Processor. Futhermore, we have created a framework for further increasing parallelism by scaling our architecture across multiple compute devices and by extending our original design to a multi-FPGA system nearly linear increase in acceleration with logic resources was achieved. iv Acknowledgements One could hardly put into words the contributions made to this work by the many wonderful people who surround me on a daily basis. I count myself blessed to have family, friends and colleagues that support and encourage me and to recognize each individually would be impossible. Nonetheless, there are some people without whose explicit mention this thesis would be incomplete
Efficient treatment and quantification of uncertainty in probabilistic seismic hazard and risk analysis
The main goals of this thesis are the development of a computationally efficient framework for stochastic treatment of various important uncertainties in probabilistic seismic hazard and risk assessment, its application to a newly created seismic risk model of Indonesia, and the analysis and quantification of the impact of these uncertainties on the distribution of estimated seismic losses for a large number of synthetic portfolios modeled after real-world counterparts.
The treatment and quantification of uncertainty in probabilistic seismic hazard and risk analysis has already been identified as an area that could benefit from increased research attention.
Furthermore, it has become evident that the lack of research considering the development and application of suitable sampling schemes to increase the computational efficiency of the stochastic simulation represents a bottleneck for applications where model runtime is an important factor.
In this research study, the development and state of the art of probabilistic seismic hazard and risk analysis is first reviewed and opportunities for improved treatment of uncertainties are identified.
A newly developed framework for the stochastic treatment of portfolio location uncertainty as well as ground motion and damage uncertainty is presented.
The framework is then optimized with respect to computational efficiency.
Amongst other techniques, a novel variance reduction scheme for portfolio location uncertainty is developed.
Furthermore, in this thesis, some well-known variance reduction schemes such as Quasi Monte Carlo, Latin Hypercube Sampling and MISER (locally adaptive recursive stratified sampling) are applied for the first time to seismic hazard and risk assessment.
The effectiveness and applicability of all used schemes is analyzed.
Several chapters of this monograph describe the theory, implementation and some exemplary applications of the framework.
To conduct these exemplary applications, a seismic hazard model for Indonesia was developed and used for the analysis and quantification of loss uncertainty for a large collection of synthetic portfolios.
As part of this work, the new framework was integrated into a probabilistic seismic hazard and risk assessment software suite developed and used by Munich Reinsurance Group.
Furthermore, those parts of the framework that deal with location and damage uncertainties are also used by the flood and storm natural catastrophe model development groups at Munich Reinsurance for their risk models
The use of primitives in the calculation of radiative view factors
Compilations of radiative view factors (often in closed analytical form) are readily available in the open literature for commonly encountered geometries. For more complex three-dimensional (3D) scenarios, however, the effort required to solve the requisite multi-dimensional integrations needed to estimate a required view factor can be daunting to say the least. In such cases, a combination of finite element methods (where the geometry in question is sub-divided into a large number of uniform, often triangular, elements) and Monte Carlo Ray Tracing (MC-RT) has been developed, although frequently the software implementation is suitable only for a limited set of geometrical scenarios. Driven initially by a need to calculate the radiative heat transfer occurring within an operational fibre-drawing furnace, this research set out to examine options whereby MC-RT could be used to cost-effectively calculate any generic 3D radiative view factor using current vectorisation technologies
Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms
We demonstrate a high-performance vendor-agnostic method for massively
parallel solving of ensembles of ordinary differential equations (ODEs) and
stochastic differential equations (SDEs) on GPUs. The method is integrated with
a widely used differential equation solver library in a high-level language
(Julia's DifferentialEquations.jl) and enables GPU acceleration without
requiring code changes by the user. Our approach achieves state-of-the-art
performance compared to hand-optimized CUDA-C++ kernels, while performing
faster than the vectorized-map (\texttt{vmap}) approach
implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel,
and Apple GPUs demonstrates performance portability and vendor-agnosticism. We
show composability with MPI to enable distributed multi-GPU workflows. The
implemented solvers are fully featured, supporting event handling, automatic
differentiation, and incorporating of datasets via the GPU's texture memory,
allowing scientists to take advantage of GPU acceleration on all major current
architectures without changing their model code and without loss of
performance.Comment: 11 figure
- …