5,736 research outputs found
On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective
We implement and benchmark parallel I/O methods for the fully-manycore driven
particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as
a major challenge for applications on today's and future HPC systems, we
present a scaling law characterizing performance bottlenecks in
state-of-the-art approaches for data reduction. Consequently, we propose,
implement and verify multi-threaded data-transformations for the I/O library
ADIOS as a feasible way to trade underutilized host-side compute potential on
heterogeneous systems for reduced I/O latency.Comment: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'1
Dynamical heterogeneities as fingerprints of a backbone structure in Potts models
We investigate slow non-equilibrium dynamical processes in two-dimensional
--state Potts model with both ferromagnetic and couplings. Dynamical
properties are characterized by means of the mean-flipping time distribution.
This quantity is known for clearly unveiling dynamical heterogeneities. Using a
two-times protocol we characterize the different time scales observed and
relate them to growth processes occurring in the system. In particular we
target the possible relation between the different time scales and the spatial
heterogeneities originated in the ground state topology, which are associated
to the presence of a backbone structure. We perform numerical simulations using
an approach based on graphics processing units (GPUs) which permits to reach
large system sizes. We present evidence supporting both the idea of a growing
process in the preasymptotic regime of the glassy phases and the existence of a
backbone structure behind this processes.Comment: 9 pages, 7 figures, Accepted for publication in PR
Topology-aware GPU scheduling for learning workloads in cloud environments
Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments.
This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing
collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon
2020 research and innovation programme (grant agreement No 639595). It is
also partially supported by the Ministry of Economy of Spain under contract
TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051,
by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program
(SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef
and Asser Tantawi for the valuable discussions. We also thank SC17 committee
member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version
Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures
Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower computational overheads and longer timescales with respect to atom- istic level models. However, their acceleration on parallel architectures such as Graphic Processing Units (GPU) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact of CG model features on parallel simulation performance. To achieve this, we implemented a GPU-accelerated version of a CG molecular dynamics simulator, to which we applied specic optimizations for CG models, such as dedicated data structures to handle dierent bead type interac- tions, obtaining a maximum speed-up of 14 on the NVIDIA GTX480 GPU with Fermi architecture. We provide a complete characterization and evaluation of algorithmic and simulated system features of CG models impacting the achievable speed-up and accuracy of results, using three dierent GPU architectures as case studie
Monte Carlo algorithm based on internal bridging moves for the atomistic simulation of thiophene oligomers and polymers
We introduce a powerful Monte Carlo (MC) algorithm for the atomistic
simulation of bulk models of oligo- and poly-thiophenes by redesigning MC moves
originally developed for considerably simpler polymer structures and
architectures, such as linear and branched polyethylene, to account for the
ring structure of the thiophene monomer. Elementary MC moves implemented
include bias reptation of an end thiophene ring, flip of an internal thiophene
ring, rotation of an end thiophene ring, concerted rotation of three thiophene
rings, rigid translation of an entire molecule, rotation of an entire molecule
and volume fluctuation. In the implementation of all moves we assume that
thiophene ring atoms remain rigid and strictly co-planar; on the other hand,
inter-ring torsion and bond bending angles remain fully flexible subject to
suitable potential energy functions. Test simulations with the new algorithm of
an important thiophene oligomer, {\alpha}-sexithiophene ({\alpha}-6T), at a
high enough temperature (above its isotropic-to-nematic phase transition) using
a new united atom model specifically developed for the purpose of this work
provide predictions for the volumetric, conformational and structural
properties that are remarkably close to those obtained from detailed atomistic
Molecular Dynamics (MD) simulations using an all-atom model. The new algorithm
is particularly promising for exploring the rich (and largely unexplored) phase
behavior and nanoscale ordering of very long (also more complex)
thiophene-based polymers which cannot be addressed by conventional MD methods
due to the extremely long relaxation times characterizing chain dynamics in
these systems
- …