702 research outputs found
Simulation of vector boson plus many jet final states at the high luminosity LHC
We present a novel event generation framework for the efficient simulation of
vector boson plus multi-jet backgrounds at the high-luminosity LHC and at
possible future hadron colliders. MPI parallelization of parton-level and
particle-level event generation and storage of parton-level event information
using the HDF5 data format allow us to obtain leading-order merged Monte-Carlo
predictions with up to nine jets in the final state. The parton-level event
samples generated in this manner correspond to an integrated luminosity of
3ab-1 and are made publicly available for future phenomenological studies.Comment: 10 pages, 8 figures, 5 table
ArrayBridge: Interweaving declarative array processing with high-performance computing
Scientists are increasingly turning to datacenter-scale computers to produce
and analyze massive arrays. Despite decades of database research that extols
the virtues of declarative query processing, scientists still write, debug and
parallelize imperative HPC kernels even for the most mundane queries. This
impedance mismatch has been partly attributed to the cumbersome data loading
process; in response, the database community has proposed in situ mechanisms to
access data in scientific file formats. Scientists, however, desire more than a
passive access method that reads arrays from files.
This paper describes ArrayBridge, a bi-directional array view mechanism for
scientific file formats, that aims to make declarative array manipulations
interoperable with imperative file-centric analyses. Our prototype
implementation of ArrayBridge uses HDF5 as the underlying array storage library
and seamlessly integrates into the SciDB open-source array database system. In
addition to fast querying over external array objects, ArrayBridge produces
arrays in the HDF5 file format just as easily as it can read from it.
ArrayBridge also supports time travel queries from imperative kernels through
the unmodified HDF5 API, and automatically deduplicates between array versions
for space efficiency. Our extensive performance evaluation in NERSC, a
large-scale scientific computing facility, shows that ArrayBridge exhibits
statistically indistinguishable performance and I/O scalability to the native
SciDB storage engine.Comment: 12 pages, 13 figure
Evaluating the benefits of key-value databases for scientific applications
The convergence of Big Data applications with High-Performance Computing requires new methodologies to store, manage and process large amounts of information. Traditional storage solutions are unable to scale and that results in complex coding strategies. For example, the brain atlas of the Human Brain Project has the challenge to process large amounts of high-resolution brain images. Given the computing needs, we study the effects of replacing a traditional storage system with a distributed Key-Value database on a cell segmentation application. The original code uses HDF5 files on GPFS through an intricate interface, imposing synchronizations. On the other hand, by using Apache Cassandra or ScyllaDB through Hecuba, the application code is greatly simplified. Thanks to the Key-Value data model, the number of synchronizations is reduced and the time dedicated to I/O scales when increasing the number of nodes.This project/research has received funding from the European Unions Horizon
2020 Framework Programme for Research and Innovation under the Speci c
Grant Agreement No. 720270 (Human Brain Project SGA1) and the Speci c
Grant Agreement No. 785907 (Human Brain Project SGA2). This work has also
been supported by the Spanish Government (SEV2015-0493), by the Spanish
Ministry of Science and Innovation (contract TIN2015-65316-P), and by Generalitat
de Catalunya (contract 2017-SGR-1414).Postprint (author's final draft
GPU Accelerated Particle Visualization with Splotch
Splotch is a rendering algorithm for exploration and visual discovery in
particle-based datasets coming from astronomical observations or numerical
simulations. The strengths of the approach are production of high quality
imagery and support for very large-scale datasets through an effective mix of
the OpenMP and MPI parallel programming paradigms. This article reports our
experiences in re-designing Splotch for exploiting emerging HPC architectures
nowadays increasingly populated with GPUs. A performance model is introduced
for data transfers, computations and memory access, to guide our re-factoring
of Splotch. A number of parallelization issues are discussed, in particular
relating to race conditions and workload balancing, towards achieving optimal
performances. Our implementation was accomplished by using the CUDA programming
paradigm. Our strategy is founded on novel schemes achieving optimized data
organisation and classification of particles. We deploy a reference simulation
to present performance results on acceleration gains and scalability. We
finally outline our vision for future work developments including possibilities
for further optimisations and exploitation of emerging technologies.Comment: 25 pages, 9 figures. Astronomy and Computing (2014
Python bindings for the open source electromagnetic simulator Meep
Meep is a broadly used open source package for finite-difference time-domain electromagnetic simulations. Python bindings for Meep make it easier to use for researchers and open promising opportunities for integration with other packages in the Python ecosystem. As this project shows, implementing Python-Meep offers benefits for specific disciplines and for the wider research community
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Exascale Deep Learning for Climate Analytics
We extract pixel-level masks of extreme weather patterns using variants of
Tiramisu and DeepLabv3+ neural networks. We describe improvements to the
software frameworks, input pipeline, and the network training algorithms
necessary to efficiently scale deep learning on the Piz Daint and Summit
systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained
throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up
to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel
efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor
Cores, a half-precision version of the DeepLabv3+ network achieves a peak and
sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November
11-16, 2018, Dallas, TX, US
Dataflow methods in HPC, visualisation and analysis
The processing power available to scientists and engineers using supercomputers over the last few decades has grown exponentially, permitting significantly more sophisticated simulations, and as a consequence, generating proportionally larger output datasets. This change has taken place in tandem with a gradual shift in the design and implementation of simulation and post-processing software, with a shift from simulation as a first step and visualisation/analysis as a second, towards in-situ on the fly methods that provide immediate visual feedback, place less strain on file-systems and reduce overall data-movement and copying. Concurrently, processor speed increases have dramatically slowed and multi and many-core architectures have instead become the norm for virtually all High Performance computing (HPC) machines. This in turn has led to a shift away from the traditional distributed one rank per node model, to one rank per process, using multiple processes per multicore node, and then back towards one rank per node again, using distributed and multi-threaded frameworks combined.
This thesis consists of a series of publications that demonstrate how software design for analysis and visualisation has tracked these architectural changes and pushed the boundaries of HPC visualisation using dataflow techniques in distributed environments. The first publication shows how support for the time dimension in parallel pipelines can be implemented, demonstrating how information flow within an application can be leveraged to optimise performance and add features such as analysis of time-dependent flows and comparison of datasets at different timesteps. A method of integrating dataflow pipelines with in-situ visualisation is subsequently presented, using asynchronous coupling of user driven GUI controls and a live simulation running on a supercomputer. The loose coupling of analysis and simulation allows for reduced IO, immediate feedback and the ability to change simulation parameters on the fly.
A significant drawback of parallel pipelines is the inefficiency caused by improper load-balancing, particularly during interactive analysis where the user may select between different features of interest, this problem is addressed in the fourth publication by integrating a high performance partitioning library into the visualization pipeline and extending the information flow up and down the pipeline to support it. This extension is demonstrated in the third publication (published earlier) on massive meshes with extremely high complexity and shows that general purpose visualization tools such as ParaView can be made to compete with bespoke software written for a dedicated task.
The future of software running on many-core architectures will involve task-based runtimes, with dynamic load-balancing, asynchronous execution based on dataflow graphs, work stealing and concurrent data sharing between simulation and analysis. The final paper of this thesis presents an optimisation for one such runtime, in support of these future HPC applications
- …