3,081 research outputs found

    Advanced Optimization Techniques For Monte Carlo Simulation On Graphics Processing Units

    Get PDF
    The objective of this work is to design and implement a self-adaptive parallel GPU optimized Monte Carlo algorithm for the simulation of adsorption in porous materials. We focus on Nvidia\u27s GPUs and CUDA\u27s Fermi architecture specifically. The resulting package supports the different ensemble methods for the Monte Carlo simulation, which will allow for the simulation of multi-component adsorption in porous solids. Such an algorithm will have broad applications to the development of novel porous materials for the sequestration of CO2 and the filtration of toxic industrial chemicals. The primary objective of this work is the release of a massively parallel open source Monte Carlo simulation engine implemented using GPUs, called GOMC. The code will utilize the canonical ensemble, and the Gibbs ensemble method, which will allow for the simulation of multiple phenomena, including liquid-vapor phase coexistence, and single and multi-component adsorption in porous materials. In addition, the grand canonical ensemble and the configurational-bias algorithms have been implemented so that polymeric materials and small proteins may be simulated. This simulation engine is the only open source GPU optimized Monte Carlo code available for the generalized simulation of adsorption and phase equilibria on a very large scale. As a result of conducting many optimization techniques and allowing the system to adjust for the change of simulation state, the original MC algorithm has been rewritten based on an existing serial algorithm to suit the massive parallel devices resulting in reductions in computational time. This large time reduction allow for the simulation of significantly larger systems for longer timescales than is currently possible with existing implementations. Results of the extensive research and applying device specific optimizations resulted in significant speedup. First, for the NVT method, a fully optimized serial algorithm has been implemented and the performance results has been compared to Towhee. A speedup of about 438 times has been achieved for a relatively small size problem of 4096 particles. In addition, two algorithms to run on the GPU with and without cell list structure have been implemented. The total speedup of the parallel code with cell list over the serial code was more than 160x faster. Moreover, for the grand canonical ensemble, a serial and two parallel algorithms have been developed. The simulation box in this method can be resized, which added a change to the algorithm that needed to adapt with the box size and adjust itself. The performance of running the CUDA code with cell list versus the serial code that doesn\u27t have a cell list structure is a factor of 130 times faster. More MC ensembles have been transferred to the GPU. The Gibbs ensemble method has two simulation boxes and three types of moves. This method has been studied carefully and the GPU algorithm has been implemented to port the computation intensive functions to the GPU. The performance of the GPU code was about 50x faster than the serial code. Finally, an extension of the Gibbs method has been implemented on the GPU. The particle transfer from one box to the other is the affected move type by this extension. CUDA streams are used to parallelize K trials for this method. A factor of three times speedup for the particle transfer move has been achieved for the best case. However, due to the low execution rate of the particle transfer move, just 10% of the total moves, the speedup has minimal effect on overall execution time of the simulation. Furthermore, a different run with all move types on Kepler K20c card has been executed, and a factor of 2 times speedup has been reported over the CUDA code on the GeForce GTX 480 card. The main contribution of this work to society is when the above implementations become open source to the public through http://gomc.eng.wayne.edu. Also, other researchers can take advantage of the lessons learned with advanced optimizations and self-adapting mechanisms specific to the GPU. On the application level, the current code can be used by the chemical engineering community to explore accurate and affordable simulations that were not possible before

    Efficient Algorithms And Optimizations For Scientific Computing On Many-Core Processors

    Get PDF
    Designing efficient algorithms for many-core and multicore architectures requires using different strategies to allow for the best exploitation of the hardware resources on those architectures. Researchers have ported many scientific applications to modern many-core and multicore parallel architectures, and by doing so they have achieved significant speedups over running on single CPU cores. While many applications have achieved significant speedups, some applications still require more effort to accelerate due to their inherently serial behavior. One class of applications that has this serial behavior is the Monte Carlo simulations. Monte Carlo simulations have been used to simulate many problems in statistical physics and statistical mechanics that were not possible to simulate using Molecular Dynamics. While there are a fair number of well-known and recognized GPU Molecular Dynamics codes, the existing Monte Carlo ensemble simulations have not been ported to the GPU, so they are relatively slow and could not run large systems in a reasonable amount of time. Due to the previously mentioned shortcomings of existing Monte Carlo ensemble codes and due to the interest of researchers to have a fast Monte Carlo simulation framework that can simulate large systems, a new GPU framework called GOMC is implemented to simulate different particle and molecular-based force fields and ensembles. GOMC simulates different Monte Carlo ensembles such as the canonical, grand canonical, and Gibbs ensembles. This work describes many challenges in developing a GPU Monte Carlo code for such ensembles and how I addressed these challenges. This work also describes efficient many-core and multicore large-scale energy calculations for Monte Carlo Gibbs ensemble using cell lists. Designing Monte Carlo molecular simulations is challenging as they have less computation and parallelism when compared to similar molecular dynamics applications. The modified cell list allows for more speedup gains for energy calculations on both many-core and multicore architectures when compared to other implementations without using the conventional cell lists. The work presents results and analysis of the cell list algorithms for each one of the parallel architectures using top of the line GPUs, CPUs, and Intel’s Phi coprocessors. In addition, the work evaluates the performance of the cell list algorithms for different problem sizes and different radial cutoffs. In addition, this work evaluates two cell list approaches, a hybrid MPI+OpenMP approach and a hybrid MPI+CUDA approach. The cell list methods are evaluated on a small cluster of multicore CPUs, Intel Phi coprocessors, and GPUs. The performance results are evaluated using different combinations of MPI processes, threads, and problem sizes. Another application presented in this dissertation involves the understanding of the properties of crystalline materials, and their design and control. Recent developments include the introduction of new models to simulate system behavior and properties that are of large experimental and theoretical interest. One of those models is the Phase-Field Crystal (PFC) model. The PFC model has enabled researchers to simulate 2D and 3D crystal structures and study defects such as dislocations and grain boundaries. In this work, GPUs are used to accelerate various dynamic properties of polycrystals in the 2D PFC model. Some properties require very intensive computation that may involve hundreds of thousands of atoms. The GPU implementation has achieved significant speedups of more than 46 times for some large systems simulations

    Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

    Full text link
    In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

    Nonequilibrium candidate Monte Carlo: A new tool for efficient equilibrium simulation

    Full text link
    Metropolis Monte Carlo simulation is a powerful tool for studying the equilibrium properties of matter. In complex condensed-phase systems, however, it is difficult to design Monte Carlo moves with high acceptance probabilities that also rapidly sample uncorrelated configurations. Here, we introduce a new class of moves based on nonequilibrium dynamics: candidate configurations are generated through a finite-time process in which a system is actively driven out of equilibrium, and accepted with criteria that preserve the equilibrium distribution. The acceptance rule is similar to the Metropolis acceptance probability, but related to the nonequilibrium work rather than the instantaneous energy difference. Our method is applicable to sampling from both a single thermodynamic state or a mixture of thermodynamic states, and allows both coordinates and thermodynamic parameters to be driven in nonequilibrium proposals. While generating finite-time switching trajectories incurs an additional cost, driving some degrees of freedom while allowing others to evolve naturally can lead to large enhancements in acceptance probabilities, greatly reducing structural correlation times. Using nonequilibrium driven processes vastly expands the repertoire of useful Monte Carlo proposals in simulations of dense solvated systems

    ASCOT: solving the kinetic equation of minority particle species in tokamak plasmas

    Full text link
    A comprehensive description of methods, suitable for solving the kinetic equation for fast ions and impurity species in tokamak plasmas using Monte Carlo approach, is presented. The described methods include Hamiltonian orbit-following in particle and guiding center phase space, test particle or guiding center solution of the kinetic equation applying stochastic differential equations in the presence of Coulomb collisions, neoclassical tearing modes and Alfv\'en eigenmodes as electromagnetic perturbations relevant to fast ions, together with plasma flow and atomic reactions relevant to impurity studies. Applying the methods, a complete reimplementation of the well-established minority species code ASCOT is carried out as a response both to the increase in computing power during the last twenty years and to the weakly structured growth of the code, which has made implementation of additional models impractical. Also, a benchmark between the previous code and the reimplementation is accomplished, showing good agreement between the codes.Comment: 13 pages, 9 figures, submitted to Computer Physics Communication

    Towards Understanding the Self-assembly of Complicated Particles via Computation.

    Full text link
    We develop advanced Monte Carlo sampling schemes and new methods of calculating thermodynamic partition functions that are used to study the self-assembly of complicated ``patchy '' particles. Patchy particles are characterized by their strong anisotropic interactions, which can cause critical slowing down in Monte Carlo simulations of their self-assembly. We prove that detailed balance is maintained for our implementation of Monte Carlo cluster moves that ameliorate critical slowing down and use these simulations to predict the structures self-assembled by patchy tetrominoes. We compare structures predicted from our simulations with those generated by an alternative learning-augmented Monte Carlo approach and show that the learning-augmented approach fails to sample thermodynamic ensembles. We prove one way to maintain detailed balance when parallelizing Monte Carlo using the checkerboard domain decomposition scheme by enumerating the state-to-state transitions for a simple model with general applicability. Our implementation of checkerboard Monte Carlo on graphics processing units enables accelerated sampling of thermodynamic properties and we use it to confirm the fluid-hexatic transition observed at high packing fractions of hard disks. We develop a new method, bottom-up building block assembly, which generates partition functions hierarchically. Bottom-up building block assembly provides a means to answer the question of which structures are favored at a given temperature and allows accelerated prediction of potential energy minimizing structures, which are difficult to determine with Monte Carlo methods. We show how the sequences of clusters generated by bottom-up building block assembly can be used to inform ``assembly pathway engineering'', the design of patchy particles whose assembly propensity is optimized for a target structure. The utility of bottom-up building block assembly is demonstrated for systems of CdTe/CdS tetrahedra, DNA-tethered nanospheres, colloidal analogues of patchy tetrominoes and shape-shifting particles.Ph.D.Chemical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91509/1/erjank_1.pd
    corecore