Search CORE

4,139 research outputs found

GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

Author: Basermann Achim
Fehske Holger
Galgon Martin
Hager Georg
Kreutzer Moritz
Pieper Andreas
Röhrig-Zöllner Melven
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Beam-Material Interaction

Author: Cerutti F.
Mokhov N. V.
Publication venue
Publication date: 08/08/2016
Field of study

Th is paper is motivated by the growing importance of better understanding of the phenomena and consequences of high- intensity energetic particle beam interactions with accelerator, generic target , and detector components. It reviews the principal physical processes of fast-particle interactions with matter, effects in materials under irradiation, materials response, related to component lifetime and performance, simulation techniques, and methods of mitigating the impact of radiation on the components and envir onment in challenging current and future applicationComment: 28 pages, contribution to the 2014 Joint International Accelerator School: Beam Loss and Accelerator Protection, Newport Beach, CA, USA , 5-14 Nov 201

arXiv.org e-Print Archive

CERN Document Server

Perspectives of Nuclear Physics in Europe: NuPECC Long Range Plan 2010

Author: Bracco A.
Chomaz P.
Gaardhøje J.J.
Heenen P.-H.
Kaiser R.
Korner G.E.
MacGregor D.
Makarow M.
Rosner G.
Widmann E.
Publication venue: European Science Foundation
Publication date: 01/01/2010
Field of study

The goal of this European Science Foundation Forward Look into the future of Nuclear Physics is to bring together the entire Nuclear Physics community in Europe to formulate a coherent plan of the best way to develop the field in the coming decade and beyond. The primary aim of Nuclear Physics is to understand the origin, evolution, structure and phases of strongly interacting matter, which constitutes nearly 100% of the visible matter in the universe. This is an immensely important and challenging task that requires the concerted effort of scientists working in both theory and experiment, funding agencies, politicians and the public. Nuclear Physics projects are often “big science”, which implies large investments and long lead times. They need careful forward planning and strong support from policy makers. This Forward Look provides an excellent tool to achieve this. It represents the outcome of detailed scrutiny by Europe’s leading experts and will help focus the views of the scientific community on the most promising directions in the field and create the basis for funding agencies to provide adequate support. The current NuPECC Long Range Plan 2010 “Perspectives of Nuclear Physics in Europe” resulted from consultation with close to 6 000 scientists and engineers over a period of approximately one year. Its detailed recommendations are presented on the following pages. For the interested public, a short summary brochure has been produced to accompany the Forward Look.</p&gt

Enlighten

FPGA-Based Hardware Accelerators for Deep Learning in Mobile Robotics

Author: Al-Ameri Yasir
Publication venue
Publication date: 23/11/2023
Field of study

The increasing demand for real-time low-power hardware processing systems, endowed with the capacity to perform compute-intensive applications, accentuated the inadequacy of the conventional architecture of multicore general-purpose processors. In an effort to meet this demand, edge computing hardware accelerators have come to the forefront, notably with regard to deep learning and robotic systems. This thesis explores preeminent hardware accelerators and examines the performance, accuracy, and power consumption of a GPU and an FPGA-based platform, both specifically designed for edge computing applications. The experiments were conducted using three deep neural network models, namely AlexNet, GoogLeNet, and ResNet-18, trained to perform binary image classification in a known environment. Our results demonstrate that the FPGA-based platform, particularly a Kria KV260 Vision AI starter kit, exhibited an inference speed of up to nine and a half times faster than that of the GPU-based Jetson Nano developer kit. Additionally, the empirical findings of this work reported as much as a quintuple efficiency over the Jetson Nano in terms of inference speed per watt with a mere 5.4\% drop in accuracy caused by the quantization process required by the FPGA. However, the Jetson Nano showed a 1.6 times faster inference rate with the AlexNet model over the KV260 and its deployment process proved to be less challenging

UTUPub

Computing for Perturbative QCD - A Snowmass White Paper

Author: /Argonne
/Fermilab
/LBNL Berkeley
/SLAC
/SLAC
/UCLA
Bauer Christian
Bern Zvi
Boughezal Radja
Campbell John
Christensen Neil
Dixon Lance
Gehrmann Thomas
Hoeche Stefan
Kanzaki Junichi
Mitov Alexander
Nadolsky Pavel
Olness Fredrick
Peskin Michael
Petriello Frank
Pittsburgh /U.
Pozzorini Stefano
Reina Laura
Siegert Frank
Wackeroth Doreen
Walsh Jonathan
Williams Ciaran
Wobisch Markus
Zurich /U.
Publication venue
Publication date: 13/09/2013
Field of study

We present a study on high-performance computing and large-scale distributed computing for perturbative QCD calculations.Comment: 21 pages, 5 table

arXiv.org e-Print Archive

UNT Digital Library

CERN Document Server

Design and optimization of a portable LQCD Monte Carlo code using OpenACC

Author: Bonati Claudio
Calore Enrico
Coscetti Simone
D'Elia Massimo
Mesiti Michele
Negro Francesco
Schifano Sebastiano Fabio
Silvi Giorgio
Tripiccione Raffaele
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2017
Field of study

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources