206 research outputs found

    How to Correctly Deal With Pseudorandom Numbers in Manycore Environments - Application to GPU programming with Shoverand

    Get PDF
    International audienceStochastic simulations are often sensitive to the source of randomness that character-izes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computa-tion time by relying more and more General Purpose Graphics Processing Units (GP-GPUs) to speed-up stochastic simulations. Such devices bring new parallelization possibilities, but they also introduce new programming difficulties. Since RNGs are at the base of any stochastic simulation, they also need to be ported to GP-GPU. There is still a lack of well-designed implementations of quality-proven RNGs on GP-GPU platforms. In this paper, we introduce ShoveRand, a frame-work defining common rules to generate random numbers uniformly on GP-GPU. Our framework is designed to cope with any GPU-enabled development platform and to expose a straightfor-ward interface to users. We also provide an existing RNG implementation with this framework to demonstrate its efficiency in both development and ease of use

    QTM: computational package using MPI protocol for quantum trajectories method

    Full text link
    The Quantum Trajectories Method (QTM) is one of {the} frequently used methods for studying open quantum systems. { The main idea of this method is {the} evolution of wave functions which {describe the system (as functions of time). Then,} so-called quantum jumps are applied at {a} randomly selected point in time. {The} obtained system state is called as a trajectory. After averaging many single trajectories{,} we obtain the approximation of the behavior of {a} quantum system.} {This fact also allows} us to use parallel computation methods. In the article{,} we discuss the QTM package which is supported by the MPI technology. Using MPI allowed {utilizing} the parallel computing for calculating the trajectories and averaging them -- as the effect of these actions{,} the time {taken by} calculations is shorter. In spite of using the C++ programming language, the presented solution is easy to utilize and does not need any advanced programming techniques. At the same time{,} it offers a higher performance than other packages realizing the QTM. It is especially important in the case of harder computational tasks{,} and the use of MPI allows {improving the} performance of particular problems which can be solved in the field of open quantum systems.Comment: 28 pages, 9 figure

    Pseudo-Random Streams for Distributed and Parallel Stochastic Simulations on GP-GPU

    Get PDF
    International audienceRandom number generation is a key element of stochastic simulations. It has been widely studied for sequential applications purposes, enabling us to reliably use pseudo-random numbers in this case. Unfortunately, we cannot be so enthusiastic when dealing with parallel stochastic simulations. Many applications still neglect random stream parallelization, leading to potentially biased results. In particular parallel execution platforms, such as Graphics Processing Units (GPUs), add their constraints to those of Pseudo-Random Number Generators (PRNGs) used in parallel. This results in a situation where potential biases can be combined with performance drops when parallelization of random streams has not been carried out rigorously. Here, we propose criteria guiding the design of good GPU-enabled PRNGs. We enhance our comments with a study of the techniques aiming to parallelize random streams correctly, in the context of GPU-enabled stochastic simulations

    Pseudo-Random Number Generation on GP-GPU

    Get PDF
    International audienceRandom number generation is a key element of stochastic simulations. It has been widely studied for sequential applications purposes, enabling us to reliably use pseudo-random numbers in this case. Unfortunately, we cannot be so enthusiastic when dealing with parallel stochastic simulations. Many applications still neglect random stream parallelization, leading to potentially biased results. Particular parallel execution platforms, such as Graphics Processing Units (GPUs), add their constraints to those of Pseudo-Random Number Generators (PRNGs) used in parallel. It results in a situation where potential biases can be combined to performance drops when parallelization of random streams has not been carried out rigorously. Here, we propose criteria guiding the design of good GPU-enabled PRNGs. We enhance our comments with a study of the techniques aiming to correctly parallelize random streams, in the context of GPU-enabled stochastic simulations

    Parallel resampling in the particle filter

    Full text link
    Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through repeated propagation, weighting and resampling steps. The propagation and weighting steps are straightforward to parallelise, as they require only independent operations on each particle. The resampling step is more difficult, as standard schemes require a collective operation, such as a sum, across particle weights. Focusing on this resampling step, we analyse two alternative schemes that do not involve a collective operation (Metropolis and rejection resamplers), and compare them to standard schemes (multinomial, stratified and systematic resamplers). We find that, in certain circumstances, the alternative resamplers can perform significantly faster on a GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover, in single precision, the standard approaches are numerically biased for upwards of hundreds of thousands of particles, while the alternatives are not. This is particularly important given greater single- than double-precision throughput on modern devices, and the consequent temptation to use single precision with a greater number of particles. Finally, we provide auxiliary functions useful for implementation, such as for the permutation of ancestry vectors to enable in-place propagation.Comment: 21 pages, 6 figure

    Massively Parallelized Monte Carlo Simulation and Its Applications in Finance

    Get PDF
    In this paper, we propose, develop and implement a tool that increases the computational speed of exotic derivatives pricing at a fraction of the cost of traditional methods. Our paper focuses on investigating the computing efficiencies of GPU systems. We utilize the GPU’s natural parallelization capabilities to price financial instruments. We outline our implementation, solutions to practical complications arising during implementation and how much faster GPU systems are. Each step that we explore has a significant impact on the efficiency and performance of GPU pricing. Rather than speaking in theoretical, abstract terms, we detail each step in an attempt to give the reader a clear sense of what’s going on. Efficiency is one of the pillars of financial calculations. With the volume of risk calculations mandated by prudent risk management practices, even moderate improvements in calculation efficiency can translate into material changes in trading limits or savings in regulatory capital. This can make the difference between a growing, successful trading operation or an also-ran. Unfortunately, a decent algorithm written in VBA cannot calculate option prices at the same speed as a farm of computers, particularly if we must price the trade in less than 150 milliseconds using 10 million simulation paths. Fast forward from one trade to a book of several hundred thousand trades, many of which are exotic products. Not only is it necessary to price each trade, but we must do so in each of thousands of different market scenarios in order to calculate even basic risk measures such as Greeks and Value-at-Risk (VaR). At the end of the paper, we discuss how GPUs are currently used in the industry and their various advantages, including cost, time, accuracy and calculation frequency. In addition, we discuss the implementation challenges of GPU systems and the attention to detail that is required for memory allocation

    Distribution of Random Streams for Simulation Practitioners

    Get PDF
    International audienceThere is an increasing interest in the distribution of parallel random number streamsin the high-performance computing community particularly, with the manycore shift. Even ifwe have at our disposal statistically sound random number generators according to the latestand thorough testing libraries, their parallelization can still be a delicate problem. Indeed, aset of recent publications shows it still has to be mastered by the scientific community. Withthe arrival of multi-core and manycore processor architectures on the scientist desktop, modelerswho are non-specialists in parallelizing stochastic simulations need help and advice in distributingrigorously their experimental plans and replications according to the state of the art in pseudo-random numbers parallelization techniques. In this paper, we discuss the different partitioningtechniques currently in use to provide independent streams with their corresponding software. Inaddition to the classical approaches in use to parallelize stochastic simulations on regular processors,this paper also presents recent advances in pseudo-random number generation for general-purposegraphical processing units. The state of the art given in this paper is written for simulationpractitioners

    GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP

    Full text link
    Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&D, as well as the conclusions and lessons learnt from the beta prototype.Comment: 34 pages, 26 figures, 24 table

    ShoveRand: a Model-Driven Framework to Easily Generate Random Numbers on GP-GPU

    Get PDF
    International audienceStochastic simulations are often sensitive to the randomness source that characterizes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computation time by using more and more General Purpose Graphics Processing Units (GP-GPUs) to speed-up stochastic simulations. Such devices bring new parallelization possibilities, but they also introduce new programming difficulties. Since RNGs are at the base of any stochastic simulation, they also need to be ported to GP-GPU. There is still a lack of well-designed implementations of quality-proven RNGs on GP-GPU platforms. In this paper, we introduce ShoveRand, a framework defining common rules to generate random numbers uniformly on GP-GPU. Our framework is designed to cope with any GPU-enabled development platform and to expose a straightforward interface to users. We also provide an existing RNG implementation with this framework to demonstrate its efficiency in both development and ease of use
    • …
    corecore