650 research outputs found

    High performance computing for a 3-D optical diffraction tomographic application in fluid velocimetry

    Get PDF
    Optical Diffraction Tomography has been recently introduced in fluid velocimetry to provide three dimensional information of seeding particle locations. In general, image reconstruction methods at visible wavelengths have to account for diffraction. Linear approximation has been used for three-dimensional image reconstruction, but a non-linear and iterative reconstruction method is required when multiple scattering is not negligible. Non-linear methods require the solution of the Helmholtz equation, computationally highly demanding due to the size of the problem. The present work shows the results of a non-linear method customized for spherical particle location using GPU computing and a made-to-measure storing format

    Massively parallel split-step Fourier techniques for simulating quantum systems on graphics processing units

    Get PDF
    The split-step Fourier method is a powerful technique for solving partial differential equations and simulating ultracold atomic systems of various forms. In this body of work, we focus on several variations of this method to allow for simulations of one, two, and three-dimensional quantum systems, along with several notable methods for controlling these systems. In particular, we use quantum optimal control and shortcuts to adiabaticity to study the non-adiabatic generation of superposition states in strongly correlated one-dimensional systems, analyze chaotic vortex trajectories in two dimensions by using rotation and phase imprinting methods, and create stable, threedimensional vortex structures in Bose–Einstein condensates through artificial magnetic fields generated by the evanescent field of an optical nanofiber. We also discuss algorithmic optimizations for implementing the split-step Fourier method on graphics processing units. All computational methods present in this work are demonstrated on physical systems and have been incorporated into a state-of-the-art and open-source software suite known as GPUE, which is currently the fastest quantum simulator of its kind.Okinawa Institute of Science and Technology Graduate Universit

    HI Lightcones for LADUMA using Gadget-3 : performance profiling and application of an HPC code

    Get PDF
    Includes bibliographical references.This project concerns the investigation, performance profiling and optimisation of the high performance cosmological code, GADGET-3. This code was used to develop a synthetic field-of-view, or lightcone, for the MeerKAT telescope to replicate what it will observe when it conducts the LADUMA ultra-deep HI survey. This lightcone will assist in the planning process of the survey. The deliverables for this project are summarised as follows: * Provide an up-to-date performance evaluation and optimisation report for the cosmological simulation code GADGET-3. * Use GADGET-3 to produce an sufficiently high resolution simulation of a region of the Universe. • Develop a Python code to produce a lightcone which represents the MeerKAT telescope's field-of-view, by post-processing simulation output snapshots. * Extract relevant metadata from the simulation snapshots to provide additional insight into the simulated observation. * Produce an efficiently written and well documented software package to enable other researchers to produce synthetic lightcones

    Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

    Get PDF
    Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly-structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-flow and irregular memory accesses. Furthermore, these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-flow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-flow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-flow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization

    Hard and Soft Error Resilience for One-sided Dense Linear Algebra Algorithms

    Get PDF
    Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). This dissertation develops fault tolerance algorithms for one-sided dense matrix factorizations, which handles Both hard and soft errors. For hard errors, we propose methods based on diskless checkpointing and Algorithm Based Fault Tolerance (ABFT) to provide full matrix protection, including the left and right factor that are normally seen in dense matrix factorizations. A horizontal parallel diskless checkpointing scheme is devised to maintain the checkpoint data with scalable performance and low space overhead, while the ABFT checksum that is generated before the factorization constantly updates itself by the factorization operations to protect the right factor. In addition, without an available fault tolerant MPI supporting environment, we have also integrated the Checkpoint-on-Failure(CoF) mechanism into one-sided dense linear operations such as QR factorization to recover the running stack of the failed MPI process. Soft error is more challenging because of the silent data corruption, which leads to a large area of erroneous data due to error propagation. Full matrix protection is developed where the left factor is protected by column-wise local diskless checkpointing, and the right factor is protected by a combination of a floating point weighted checksum scheme and soft error modeling technique. To allow practical use on large scale system, we have also developed a complexity reduction scheme such that correct computing results can be recovered with low performance overhead. Experiment results on large scale cluster system and multicore+GPGPU hybrid system have confirmed that our hard and soft error fault tolerance algorithms exhibit the expected error correcting capability, low space and performance overhead and compatibility with double precision floating point operation

    The Role of Pressure in Inverse Design for Assembly

    Full text link
    Isotropic pairwise interactions that promote the self assembly of complex particle morphologies have been discovered by inverse design strategies derived from the molecular coarse-graining literature. While such approaches provide an avenue to reproduce structural correlations, thermodynamic quantities such as the pressure have typically not been considered in self-assembly applications. In this work, we demonstrate that relative entropy optimization can be used to discover potentials that self-assemble into targeted cluster morphologies with a prescribed pressure when the iterative simulations are performed in the isothermal-isobaric ensemble. By tuning the pressure in the optimization, we generate a family of simple pair potentials that all self-assemble the same structure. Selecting an appropriate simulation ensemble to control the thermodynamic properties of interest is a general design strategy that could also be used to discover interaction potentials that self-assemble structures having, for example, a specified chemical potential.Comment: 29 pages, 8 figure

    Purdue Contribution of Fusion Simulation Program

    Full text link
    The overall science goal of the FSP is to develop predictive simulation capability for magnetically confined fusion plasmas at an unprecedented level of integration and fidelity. This will directly support and enable effective U.S. participation in research related to the International Thermonuclear Experimental Reactor (ITER) and the overall mission of delivering practical fusion energy. The FSP will address a rich set of scientific issues together with experimental programs, producing validated integrated physics results. This is very well aligned with the mission of the ITER Organization to coordinate with its members the integrated modeling and control of fusion plasmas, including benchmarking and validation activities. [1]. Initial FSP research will focus on two critical areas: 1) the plasma edge and 2) whole device modeling including disruption avoidance. The first of these problems involves the narrow plasma boundary layer and its complex interactions with the plasma core and the surrounding material wall. The second requires development of a computationally tractable, but comprehensive model that describes all equilibrium and dynamic processes at a sufficient level of detail to provide useful prediction of the temporal evolution of fusion plasma experiments. The initial driver for the whole device model (WDM) will be prediction and avoidance of discharge-terminating disruptions, especially at high performance, which are a critical impediment to successful operation of machines like ITER. If disruptions prove unable to be avoided, their associated dynamics and effects will be addressed in the next phase of the FSP. The FSP plan targets the needed modeling capabilities by developing Integrated Science Applications (ISAs) specific to their needs. The Pedestal-Boundary model will include boundary magnetic topology, cross-field transport of multi-species plasmas, parallel plasma transport, neutral transport, atomic physics and interactions with the plasma wall. It will address the origins and structure of the plasma electric field, rotation, the L-H transition, and the wide variety of pedestal relaxation mechanisms. The Whole Device Model will predict the entire discharge evolution given external actuators (i.e., magnets, power supplies, heating, current drive and fueling systems) and control strategies. Based on components operating over a range of physics fidelity, the WDM will model the plasma equilibrium, plasma sources, profile evolution, linear stability and nonlinear evolution toward a disruption (but not the full disruption dynamics). The plan assumes that, as the FSP matures and demonstrates success, the program will evolve and grow, enabling additional science problems to be addressed. The next set of integration opportunities could include: 1) Simulation of disruption dynamics and their effects; 2) Prediction of core profile including 3D effects, mesoscale dynamics and integration with the edge plasma; 3) Computation of non-thermal particle distributions, self-consistent with fusion, radio frequency (RF) and neutral beam injection (NBI) sources, magnetohydrodynamics (MHD) and short-wavelength turbulence
    corecore