1,646 research outputs found

    Quantum Monte Carlo for large chemical systems: Implementing efficient strategies for petascale platforms and beyond

    Full text link
    Various strategies to implement efficiently QMC simulations for large chemical systems are presented. These include: i.) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), ii.) the possibility of keeping the memory footprint minimal, iii.) the important enhancement of single-core performance when efficient optimization tools are employed, and iv.) the definition of a universal, dynamic, fault-tolerant, and load-balanced computational framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056 and 1731 electrons). Using 10k-80k computing cores of the Curie machine (GENCI-TGCC-CEA, France) QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible

    Monte Carlo and Depletion Reactor Analysis for High-Performance Computing Applications

    Get PDF
    This dissertation discusses the research and development for a coupled neutron trans- port/isotopic depletion capability for use in high-preformance computing applications. Accurate neutronics modeling and simulation for \real reactor problems has been a long sought after goal in the computational community. A complementary \stretch goal to this is the ability to perform full-core depletion analysis and spent fuel isotopic characterization. This dissertation thus presents the research and development of a coupled Monte Carlo transport/isotopic depletion implementation with the Exnihilo framework geared for high-performance computing architectures to enable neutronics analysis for full-core reactor problems. An in-depth case study of the current state of Monte Carlo neutron transport with respect to source sampling, source convergence, uncertainty underprediction and biases associated with localized tallies in Monte Carlo eigenvalue calculations was performed using MCNPand KENO. This analysis is utilized in the design and development of the statistical algorithms for Exnihilo\u27s Monte Carlo framework, Shift. To this end, a methodology has been developed in order to perform tally statistics in domain decomposed environments. This methodology has been shown to produce accurate tally uncertainty estimates in domain-decomposed environments without a significant increase in the memory requirements, processor-to-processor communications, or computational biases. With the addition of parallel, domain-decomposed tally uncertainty estimation processes, a depletion package was developed for the Exnihilo code suite to utilize the depletion capabilities of the Oak Ridge Isotope GENeration code. This interface was designed to be transport agnostic, meaning that it can be used by any of the reactor analysis packages within Exnihilo such as Denovo or Shift. Extensive validation and testing of the ORIGEN interface and coupling with the Shift Monte Carlo transport code is performed within this dissertation, and results are presented for the calculated eigenvalues, material powers, and nuclide concentrations for the depleted materials. These results are then compared to ORIGEN and TRITON depletion calculations, and analysis shows that the Exnihilo transport-depletion capability is in good agreement with these codes

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Towards Lattice Quantum Chromodynamics on FPGA devices

    Get PDF
    In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

    Doctor of Philosophy

    Get PDF
    dissertationRadiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively

    Spectral Efficiency Maximization of a Single Cell Massive MU-MIMO Down-Link TDD System by Appropriate Resource Allocation

    Get PDF
    This paper deals with the problem of maximizing the spectral efficiency in a massive multi-user MIMO downlink system, where a base station is equipped with a very large number of antennas and serves single-antenna users simultaneously in the same frequency band, and the beamforming training scheme is employed in the time-division duplex mode. An optimal resource allocation that jointly selects the training duration on uplink transmission, the training signal power on downlink transmission, the training signal power on uplink transmission, and the data signal power on downlink transmission is proposed in such a way that the spectral efficiency is maximized given the total energy budget. Since the spectral efficiency is the main concern of this work, and its calculation using the lower bound on the achievable rate is computationally very intensive, in this paper, we also derive approximate expressions for the lower bound of achievable downlink rate for the maximum ratio transmission (MRT) and zero-forcing (ZF) precoders. The computational simplicity and accuracy of the approximate expressions for the lower bound of achievable downlink rate are validated through simulations. By employing these approximate expressions, experiments are conducted to obtain the spectral efficiency of the massive MIMO downlink time-division duplexing system with the optimal resource allocation and that of the beamforming training scheme. It is shown that the spectral efficiency of the former system using the optimal resource allocation is superior to that yielded by the latter scheme in the cases of both MRT and ZF precoders

    Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

    Get PDF
    A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership

    Multi-core performance studies of a Monte Carlo neutron transport code

    Get PDF
    Performance results are presented for a multi-threaded version of the OpenMC Monte Carlo neutronics code using OpenMP in the context of nuclear reactor criticality calculations. Our main interest is production computing, and thus we limit our approach to threading strategies that both require reasonable levels of development effort and preserve the code features necessary for robust application to real-world reactor problems. Several approaches are developed and the results compared on several multi-core platforms using a popular reactor physics benchmark. A broad range of performance studies are distilled into a simple, consistent picture of the empirical performance characteristics of reactor Monte Carlo algorithms on current multi-core architectures.United States. Dept. of Energy. Office of Advanced Scientific Computing Research (Contract DEAC02-06CH11357
    • …
    corecore