5,736 research outputs found

    On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

    Full text link
    We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency.Comment: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'1

    Dynamical heterogeneities as fingerprints of a backbone structure in Potts models

    Full text link
    We investigate slow non-equilibrium dynamical processes in two-dimensional qq--state Potts model with both ferromagnetic and ±J\pm J couplings. Dynamical properties are characterized by means of the mean-flipping time distribution. This quantity is known for clearly unveiling dynamical heterogeneities. Using a two-times protocol we characterize the different time scales observed and relate them to growth processes occurring in the system. In particular we target the possible relation between the different time scales and the spatial heterogeneities originated in the ground state topology, which are associated to the presence of a backbone structure. We perform numerical simulations using an approach based on graphics processing units (GPUs) which permits to reach large system sizes. We present evidence supporting both the idea of a growing process in the preasymptotic regime of the glassy phases and the existence of a backbone structure behind this processes.Comment: 9 pages, 7 figures, Accepted for publication in PR

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

    Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures

    Get PDF
    Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower computational overheads and longer timescales with respect to atom- istic level models. However, their acceleration on parallel architectures such as Graphic Processing Units (GPU) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact of CG model features on parallel simulation performance. To achieve this, we implemented a GPU-accelerated version of a CG molecular dynamics simulator, to which we applied specic optimizations for CG models, such as dedicated data structures to handle dierent bead type interac- tions, obtaining a maximum speed-up of 14 on the NVIDIA GTX480 GPU with Fermi architecture. We provide a complete characterization and evaluation of algorithmic and simulated system features of CG models impacting the achievable speed-up and accuracy of results, using three dierent GPU architectures as case studie

    Monte Carlo algorithm based on internal bridging moves for the atomistic simulation of thiophene oligomers and polymers

    Full text link
    We introduce a powerful Monte Carlo (MC) algorithm for the atomistic simulation of bulk models of oligo- and poly-thiophenes by redesigning MC moves originally developed for considerably simpler polymer structures and architectures, such as linear and branched polyethylene, to account for the ring structure of the thiophene monomer. Elementary MC moves implemented include bias reptation of an end thiophene ring, flip of an internal thiophene ring, rotation of an end thiophene ring, concerted rotation of three thiophene rings, rigid translation of an entire molecule, rotation of an entire molecule and volume fluctuation. In the implementation of all moves we assume that thiophene ring atoms remain rigid and strictly co-planar; on the other hand, inter-ring torsion and bond bending angles remain fully flexible subject to suitable potential energy functions. Test simulations with the new algorithm of an important thiophene oligomer, {\alpha}-sexithiophene ({\alpha}-6T), at a high enough temperature (above its isotropic-to-nematic phase transition) using a new united atom model specifically developed for the purpose of this work provide predictions for the volumetric, conformational and structural properties that are remarkably close to those obtained from detailed atomistic Molecular Dynamics (MD) simulations using an all-atom model. The new algorithm is particularly promising for exploring the rich (and largely unexplored) phase behavior and nanoscale ordering of very long (also more complex) thiophene-based polymers which cannot be addressed by conventional MD methods due to the extremely long relaxation times characterizing chain dynamics in these systems
    corecore