10 research outputs found

    Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

    Full text link
    Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide 5x higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3x performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM

    AUTOMATING DATA-LAYOUT DECISIONS IN DOMAIN-SPECIFIC LANGUAGES

    Get PDF
    A long-standing challenge in High-Performance Computing (HPC) is the simultaneous achievement of programmer productivity and hardware computational efficiency. The challenge has been exacerbated by the onset of multi- and many-core CPUs and accelerators. Only a few expert programmers have been able to hand-code domain-specific data transformations and vectorization schemes needed to extract the best possible performance on such architectures. In this research, we examined the possibility of automating these methods by developing a Domain-Specific Language (DSL) framework. Our DSL approach extends C++14 by embedding into it a high-level data-parallel array language, and by using a domain-specific compiler to compile to hybrid-parallel code. We also implemented an array index-space transformation algebra within this high-level array language to manipulate array data-layouts and data-distributions. The compiler introduces a novel method for SIMD auto-vectorization based on array data-layouts. Our new auto-vectorization technique is shown to outperform the default auto-vectorization strategy by up to 40% for stencil computations. The compiler also automates distributed data movement with overlapping of local compute with remote data movement using polyhedral integer set analysis. Along with these main innovations, we developed a new technique using C++ template metaprogramming for developing embedded DSLs using C++. We also proposed a domain-specific compiler intermediate representation that simplifies data flow analysis of abstract DSL constructs. We evaluated our framework by constructing a DSL for the HPC grand-challenge domain of lattice quantum chromodynamics. Our DSL yielded performance gains of up to twice the flop rate over existing production C code for selected kernels. This gain in performance was obtained while using less than one-tenth the lines of code. The performance of this DSL was also competitive with the best hand-optimized and hand-vectorized code, and is an order of magnitude better than existing production DSLs.Doctor of Philosoph

    Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers

    Get PDF
    Preface The aim of this master\u2019s thesis project was to expand the QPhiX library for twisted-mass fermions with and without clover term. To this end, I continued work initiated by Mario Schr\uf6ck et al. [63]. In writing this thesis, I was following two main goals. Firstly, I wanted to stress the intricate interplay of the four pillars of High Performance Computing: Algorithms, Hardware, Software and Performance Evaluation. Surely, algorithmic development is utterly important in Scientific Computing, in particular in LQCD, where it even outweighed the improvements made in Hardware architecture in the last decade\u2014cf. the section about computational costs of LQCD. It is strongly influenced by the available hardware\u2014think of the advent of parallel algorithms\u2014but in turn also influenced the design of hardware itself. The IBM BlueGene series is only one of many examples in LQCD. Furthermore, there will be no benefit from the best algorithms, when one cannot implement the ideas into correct, performant, user-friendly, read- and maintainable (sometimes over several decades) software code. But again, truly outstanding HPC software cannot be written without a profound knowledge of its target hardware. Lastly, an HPC software architect and computational scientist has to be able to evaluate and benchmark the performance of a software program, in the often very heterogeneous environment of supercomputers with multiple software and hardware layers. My second goal in writing this thesis was to produce a self-contained introduction into the computational aspects of LQCD and in particular, to the features of QPhiX, so the reader would be able to compile, read and understand the code of one truly amazing pearl of HPC [40]. It is a pleasure to thank S. Cozzini, R. Frezzotti, E. Gregory, B. Jo\uf3, B. Kostrzewa, S. Krieg, T. Luu, G. Martinelli, R. Percacci, S. Simula, M. Ueding, C. Urbach, M. Werner, the Intel company for providing me with a copy of [55], and the J\ufclich Supercomputing Center for granting me access to their KNL test cluster DEE

    Gravitational form factors of the proton from lattice QCD

    Full text link
    The gravitational form factors (GFFs) of a hadron encode fundamental aspects of its structure, including its shape and size as defined from e.g., its energy density. This work presents a determination of the flavor decomposition of the GFFs of the proton from lattice QCD, in the kinematic region 0t2 GeV20\leq -t\leq 2~\text{GeV}^2. The decomposition into up-, down-, strange-quark, and gluon contributions provides first-principles constraints on the role of each constituent in generating key proton structure observables, such as its mechanical radius, mass radius, and DD-term.Comment: Additional comparisons added to Figures 2 and 4. 8 pages, 4 figures, 1 table in the main text plus 11 pages, 8 figures, 2 tables in the supplementary materia

    Gravitational form factors of the pion from lattice QCD

    Full text link
    The two gravitational form factors of the pion, Aπ(t)A^{\pi}(t) and Dπ(t)D^{\pi}(t), are computed as functions of the momentum transfer squared tt in the kinematic region 0t<2 GeV20\leq -t< 2~\text{GeV}^2 on a lattice QCD ensemble with quark masses corresponding to a close-to-physical pion mass mπ170 MeVm_{\pi}\approx 170~\text{MeV} and Nf=2+1N_f=2+1 quark flavors. The flavor decomposition of these form factors into gluon, up/down light-quark, and strange quark contributions is presented in the MS\overline{\text{MS}} scheme at energy scale μ=2 GeV\mu=2~\text{GeV}, with renormalization factors computed non-perturbatively via the RI-MOM scheme. Using monopole and (modified) zz-expansion fits to the gravitational form factors, we obtain estimates for the pion momentum fraction and DD-term that are consistent with the momentum fraction sum rule and the next-to-leading order chiral perturbation theory prediction for Dπ(0)D^{\pi}(0).Comment: 28 pages, 17 figures, 7 table

    The Continuum and Leading Twist Limits of Parton Distribution Functions in Lattice QCD

    Get PDF
    In this study, we present continuum limit results for the unpolarized parton distribution function of the nucleon computed in lattice QCD. This study is the first continuum limit using the pseudo-PDF approach with Short Distance Factorization for factorizing lattice QCD calculable matrix elements. Our findings are also compared with the pertinent phenomenological determinations. Inter alia, we are employing the summation Generalized Eigenvalue Problem (sGEVP) technique in order to optimize our control over the excited state contamination which can be one of the most serious systematic errors in this type of calculations. A crucial novel ingredient of our analysis is the parameterization of systematic errors using Jacobi polynomials to characterize and remove both lattice spacing and higher twist contaminations, as well as the leading twist distribution. This method can be expanded in further studies to remove all other systematic errors.Comment: 56 pages, 29 figure

    Proceedings of the ECCOMAS Thematic Conference on Multibody Dynamics 2015

    Get PDF
    This volume contains the full papers accepted for presentation at the ECCOMAS Thematic Conference on Multibody Dynamics 2015 held in the Barcelona School of Industrial Engineering, Universitat Politècnica de Catalunya, on June 29 - July 2, 2015. The ECCOMAS Thematic Conference on Multibody Dynamics is an international meeting held once every two years in a European country. Continuing the very successful series of past conferences that have been organized in Lisbon (2003), Madrid (2005), Milan (2007), Warsaw (2009), Brussels (2011) and Zagreb (2013); this edition will once again serve as a meeting point for the international researchers, scientists and experts from academia, research laboratories and industry working in the area of multibody dynamics. Applications are related to many fields of contemporary engineering, such as vehicle and railway systems, aeronautical and space vehicles, robotic manipulators, mechatronic and autonomous systems, smart structures, biomechanical systems and nanotechnologies. The topics of the conference include, but are not restricted to: ● Formulations and Numerical Methods ● Efficient Methods and Real-Time Applications ● Flexible Multibody Dynamics ● Contact Dynamics and Constraints ● Multiphysics and Coupled Problems ● Control and Optimization ● Software Development and Computer Technology ● Aerospace and Maritime Applications ● Biomechanics ● Railroad Vehicle Dynamics ● Road Vehicle Dynamics ● Robotics ● Benchmark ProblemsPostprint (published version

    Multibody dynamics 2015

    Get PDF
    This volume contains the full papers accepted for presentation at the ECCOMAS Thematic Conference on Multibody Dynamics 2015 held in the Barcelona School of Industrial Engineering, Universitat Politècnica de Catalunya, on June 29 - July 2, 2015. The ECCOMAS Thematic Conference on Multibody Dynamics is an international meeting held once every two years in a European country. Continuing the very successful series of past conferences that have been organized in Lisbon (2003), Madrid (2005), Milan (2007), Warsaw (2009), Brussels (2011) and Zagreb (2013); this edition will once again serve as a meeting point for the international researchers, scientists and experts from academia, research laboratories and industry working in the area of multibody dynamics. Applications are related to many fields of contemporary engineering, such as vehicle and railway systems, aeronautical and space vehicles, robotic manipulators, mechatronic and autonomous systems, smart structures, biomechanical systems and nanotechnologies. The topics of the conference include, but are not restricted to: Formulations and Numerical Methods, Efficient Methods and Real-Time Applications, Flexible Multibody Dynamics, Contact Dynamics and Constraints, Multiphysics and Coupled Problems, Control and Optimization, Software Development and Computer Technology, Aerospace and Maritime Applications, Biomechanics, Railroad Vehicle Dynamics, Road Vehicle Dynamics, Robotics, Benchmark Problems. The conference is organized by the Department of Mechanical Engineering of the Universitat Politècnica de Catalunya (UPC) in Barcelona. The organizers would like to thank the authors for submitting their contributions, the keynote lecturers for accepting the invitation and for the quality of their talks, the awards and scientific committees for their support to the organization of the conference, and finally the topic organizers for reviewing all extended abstracts and selecting the awards nominees.Postprint (published version
    corecore