10 research outputs found
Exploring the Performance Benefit of Hybrid Memory System on HPC Environments
Hardware accelerators have become a de-facto standard to achieve high
performance on current supercomputers and there are indications that this trend
will increase in the future. Modern accelerators feature high-bandwidth memory
next to the computing cores. For example, the Intel Knights Landing (KNL)
processor is equipped with 16 GB of high-bandwidth memory (HBM) that works
together with conventional DRAM memory. Theoretically, HBM can provide 5x
higher bandwidth than conventional DRAM. However, many factors impact the
effective performance achieved by applications, including the application
memory access pattern, the problem size, the threading level and the actual
memory configuration. In this paper, we analyze the Intel KNL system and
quantify the impact of the most important factors on the application
performance by using a set of applications that are representative of
scientific and data-analytics workloads. Our results show that applications
with regular memory access benefit from MCDRAM, achieving up to 3x performance
when compared to the performance obtained using only DRAM. On the contrary,
applications with random memory access pattern are latency-bound and may suffer
from performance degradation when using only MCDRAM. For those applications,
the use of additional hardware threads may help hide latency and achieve higher
aggregated bandwidth when using HBM
AUTOMATING DATA-LAYOUT DECISIONS IN DOMAIN-SPECIFIC LANGUAGES
A long-standing challenge in High-Performance Computing (HPC) is the simultaneous achievement of programmer productivity and hardware computational efficiency. The challenge has been exacerbated by the onset of multi- and many-core CPUs and accelerators. Only a few expert programmers have been able to hand-code domain-specific data transformations and vectorization schemes needed to extract the best possible performance on such architectures. In this research, we examined the possibility of automating these methods by developing a Domain-Specific Language (DSL) framework. Our DSL approach extends C++14 by embedding into it a high-level data-parallel array language, and by using a domain-specific compiler to compile to hybrid-parallel code. We also implemented an array index-space transformation algebra within this high-level array language to manipulate array data-layouts and data-distributions. The compiler introduces a novel method for SIMD auto-vectorization based on array data-layouts. Our new auto-vectorization technique is shown to outperform the default auto-vectorization strategy by up to 40% for stencil computations. The compiler also automates distributed data movement with overlapping of local compute with remote data movement using polyhedral integer set analysis. Along with these main innovations, we developed a new technique using C++ template metaprogramming for developing embedded DSLs using C++. We also proposed a domain-specific compiler intermediate representation that simplifies data flow analysis of abstract DSL constructs. We evaluated our framework by constructing a DSL for the HPC grand-challenge domain of lattice quantum chromodynamics. Our DSL yielded performance gains of up to twice the flop rate over existing production C code for selected kernels. This gain in performance was obtained while using less than one-tenth the lines of code. The performance of this DSL was also competitive with the best hand-optimized and hand-vectorized code, and is an order of magnitude better than existing production DSLs.Doctor of Philosoph
Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers
Preface
The aim of this master\u2019s thesis project was to expand the QPhiX library for twisted-mass fermions with and without clover term. To this end, I continued work initiated by Mario Schr\uf6ck et al. [63]. In writing this thesis, I was following two main goals. Firstly, I wanted to stress the intricate interplay of the four pillars of High Performance Computing: Algorithms, Hardware, Software and Performance Evaluation. Surely, algorithmic development is utterly important in Scientific Computing, in particular in LQCD, where it even outweighed the improvements made in Hardware architecture in the last decade\u2014cf. the section about computational costs of LQCD. It is strongly influenced by the available hardware\u2014think of the advent of parallel algorithms\u2014but in turn also influenced the design of hardware itself. The IBM BlueGene series is only one of many examples in LQCD. Furthermore, there will be no benefit from the best algorithms, when one cannot implement the ideas into correct, performant, user-friendly, read- and maintainable (sometimes over several decades) software code. But again, truly outstanding HPC software cannot be written without a profound knowledge of its target hardware. Lastly, an HPC software architect and computational scientist has to be able to evaluate and benchmark the performance of a software program, in the often very heterogeneous environment of supercomputers with multiple software and hardware
layers. My second goal in writing this thesis was to produce a self-contained introduction into the computational aspects of LQCD and in particular, to the features of QPhiX, so the reader would be able to compile, read and understand the code of one truly amazing pearl of HPC [40]. It is a pleasure to thank S. Cozzini, R. Frezzotti, E. Gregory, B. Jo\uf3, B. Kostrzewa, S. Krieg,
T. Luu, G. Martinelli, R. Percacci, S. Simula, M. Ueding, C. Urbach, M. Werner, the Intel company for providing me with a copy of [55], and the J\ufclich Supercomputing Center for granting me access to their KNL test cluster DEE
Gravitational form factors of the proton from lattice QCD
The gravitational form factors (GFFs) of a hadron encode fundamental aspects
of its structure, including its shape and size as defined from e.g., its energy
density. This work presents a determination of the flavor decomposition of the
GFFs of the proton from lattice QCD, in the kinematic region . The decomposition into up-, down-, strange-quark, and gluon
contributions provides first-principles constraints on the role of each
constituent in generating key proton structure observables, such as its
mechanical radius, mass radius, and -term.Comment: Additional comparisons added to Figures 2 and 4. 8 pages, 4 figures,
1 table in the main text plus 11 pages, 8 figures, 2 tables in the
supplementary materia
Gravitational form factors of the pion from lattice QCD
The two gravitational form factors of the pion, and
, are computed as functions of the momentum transfer squared in
the kinematic region on a lattice QCD ensemble with
quark masses corresponding to a close-to-physical pion mass and quark flavors. The flavor decomposition of these
form factors into gluon, up/down light-quark, and strange quark contributions
is presented in the scheme at energy scale
, with renormalization factors computed non-perturbatively
via the RI-MOM scheme. Using monopole and (modified) -expansion fits to the
gravitational form factors, we obtain estimates for the pion momentum fraction
and -term that are consistent with the momentum fraction sum rule and the
next-to-leading order chiral perturbation theory prediction for .Comment: 28 pages, 17 figures, 7 table
The Continuum and Leading Twist Limits of Parton Distribution Functions in Lattice QCD
In this study, we present continuum limit results for the unpolarized parton
distribution function of the nucleon computed in lattice QCD. This study is the
first continuum limit using the pseudo-PDF approach with Short Distance
Factorization for factorizing lattice QCD calculable matrix elements. Our
findings are also compared with the pertinent phenomenological determinations.
Inter alia, we are employing the summation Generalized Eigenvalue Problem
(sGEVP) technique in order to optimize our control over the excited state
contamination which can be one of the most serious systematic errors in this
type of calculations. A crucial novel ingredient of our analysis is the
parameterization of systematic errors using Jacobi polynomials to characterize
and remove both lattice spacing and higher twist contaminations, as well as the
leading twist distribution. This method can be expanded in further studies to
remove all other systematic errors.Comment: 56 pages, 29 figure
Proceedings of the ECCOMAS Thematic Conference on Multibody Dynamics 2015
This volume contains the full papers accepted for presentation at the ECCOMAS Thematic Conference on Multibody Dynamics 2015 held in the Barcelona School of Industrial Engineering, Universitat Politècnica de Catalunya, on June 29 - July 2, 2015. The ECCOMAS Thematic Conference on Multibody Dynamics is an international meeting held once every two years in a European country. Continuing the very successful series of past conferences that have been organized in Lisbon (2003), Madrid (2005), Milan (2007), Warsaw (2009), Brussels (2011) and Zagreb (2013); this edition will once again serve as a meeting point for the international researchers, scientists and experts from academia, research laboratories and industry working in the area of multibody dynamics. Applications are related to many fields of contemporary engineering, such as vehicle and railway systems, aeronautical and space vehicles, robotic manipulators, mechatronic and autonomous systems, smart structures, biomechanical systems and nanotechnologies. The topics of the conference include, but are not restricted to: ● Formulations and Numerical Methods ● Efficient Methods and Real-Time Applications ● Flexible Multibody Dynamics ● Contact Dynamics and Constraints ● Multiphysics and Coupled Problems ● Control and Optimization ● Software Development and Computer Technology ● Aerospace and Maritime Applications ● Biomechanics ● Railroad Vehicle Dynamics ● Road Vehicle Dynamics ● Robotics ● Benchmark ProblemsPostprint (published version
Multibody dynamics 2015
This volume contains the full papers accepted for presentation at the ECCOMAS Thematic Conference on Multibody Dynamics 2015 held in the Barcelona School of Industrial Engineering, Universitat Politècnica de Catalunya, on June 29 - July 2, 2015. The ECCOMAS Thematic Conference on Multibody Dynamics is an international meeting held once every two years in a European country. Continuing the very successful series of past conferences that have been organized in Lisbon (2003), Madrid (2005), Milan (2007), Warsaw (2009), Brussels (2011) and Zagreb (2013); this edition will once again serve as a meeting point for the international researchers, scientists and experts from academia, research laboratories and industry working in the area of multibody dynamics. Applications are related to many fields of contemporary engineering, such as vehicle and railway systems, aeronautical and space vehicles, robotic manipulators, mechatronic and autonomous systems, smart structures, biomechanical systems and nanotechnologies. The topics of the conference include, but are not restricted to: Formulations and Numerical Methods, Efficient Methods and Real-Time Applications, Flexible Multibody Dynamics, Contact Dynamics and Constraints, Multiphysics and Coupled Problems, Control and Optimization, Software Development and Computer Technology, Aerospace and Maritime Applications, Biomechanics, Railroad Vehicle Dynamics, Road Vehicle Dynamics, Robotics, Benchmark Problems. The conference is organized by the Department of Mechanical Engineering of the Universitat Politècnica de Catalunya (UPC) in Barcelona. The organizers would like to thank the authors for submitting their contributions, the keynote lecturers for accepting the invitation and for the quality of their talks, the awards and scientific committees for their support to the organization of the conference, and finally the topic organizers for reviewing all extended abstracts and selecting the awards nominees.Postprint (published version