4 research outputs found

    Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor

    Full text link
    The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining the sustained peak memory bandwidth and floating-point performance for all levels of the memory hierarchy, in all the different KNL cluster modes.We then determine arithmetic intensity and performance for a suite of application kernels being targeted for the KNL based supercomputer Cori, and make comparisons to current Intel Xeon processors. Cori is the National Energy Research Scientific Computing Center’s (NERSC) next generation supercomputer. Scheduled for deployment mid-2016, it will be one of the earliest and largest KNL deployments in the world

    VECTORIZATION OF SMALL-SIZED SPECIAL-TYPE MATRICES MULTIPLICATION USING INSTRUCTIONS AVX-512

    Get PDF
    Modern software packages for supercomputer calculations require a large amount of computing resources. At the same time there are new hardware architectures that open up new opportunities for program code optimizing. The AVX-512 instruction set is a unique tool with many useful features that allow creating high-performance parallel code for supercomputer calculations. The most striking features of AVX-512 instruction set are special mask registers that allow selecting the elements of vectors for processing, combined arithmetic instructions, vector transcendental instructions, operations of multiple memory access with different arbitrary offsets, and many others. Some numerical methods use special objects, the processing speed of which critically affects the speed of the entire calculation package. Matrices of size 5 by 5, represented as submatrices of 8-by-8 matrices, are such critical objects for the numerical RANS/ILES method, which is used for nonstationary turbulent flows calculating. The main operation for working with such matrices is multiplication. In this paper we consider an effective approach to vectorizing the multiplication of 8-by-8 matrices. Further, the effect of decreasing the dimension of the matrices on the efficiency of vectorization is estimated. The considered approach is realized with the help of special intrinsic functions for the AVX-512 instruction set and it is checked on a supercomputer located in JSCC RAS

    Uncertainties in ab initio nuclear structure calculations with chiral interactions

    Get PDF
    We present theoretical ground state energies and their uncertainties for p-shell nuclei obtained from chiral effective field theory internucleon interactions as a function of chiral order, fitted to two- and three-body data only. We apply a Similary Renormalization Group transformation to improve the numerical convergence of the many-body calculations, and discuss both the numerical uncertainties arising from basis truncations and those from omitted induced many-body forces, as well as chiral truncation uncertainties. With complete Next-to-Next-to-Leading (N²LO) order two- and three-body interactions, we find significant overbinding for the ground states in the upper p-shell, but using higher-order two-body potentials, in combination with N²LO three-body forces, our predictions agree with experiment throughout the p-shell to within our combined estimated uncertainties. The uncertainties due to chiral order truncation are noticeably larger than the numerical uncertainties, but they are expected to become comparable to the numerical uncertainties at complete N³LO

    High Performance Computing [electronic resource] : ISC High Performance 2016 International Workshops, ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P^3MA, VHPC, WOPSSS, Frankfurt, Germany, June 19–23, 2016, Revised Selected Papers /

    No full text
    This book constitutes revised selected papers from 7 workshops that were held in conjunction with the ISC High Performance 2016 conference in Frankfurt, Germany, in June 2016. The 45 papers presented in this volume were carefully reviewed and selected for inclusion in this book. They stem from the following workshops: Workshop on Exascale Multi/Many Core Computing Systems, E-MuCoCoS; Second International Workshop on Communication Architectures at Extreme Scale, ExaComm; HPC I/O in the Data Center Workshop, HPC-IODC; International Workshop on OpenPOWER for HPC, IWOPH; Workshop on the Application Performance on Intel Xeon Phi – Being Prepared for KNL and Beyond, IXPUG; Workshop on Performance and Scalability of Storage Systems, WOPSSS; and International Workshop on Performance Portable Programming Models for Accelerators, P3MA.E-MuCoCoS -- 2016 Workshop on Exascale Multi/Many Core Computing Systems -- Behavioral Emulation for Scalable Design-Space Exploration of Algorithms and Architectures -- Closing the Performance Gap with Modern C++ -- Energy Efficient Runtime Framework for Exascale Systems -- Extreme-Scale In-Situ Visualization of Turbulent Flows on IBM Blue Gene/Q JUQUEEN -- The EPiGRAM Project: Preparing Parallel Programming Models for Exascale -- Work Distribution of Data-parallel Applications on Heterogeneous Systems -- ExaComm -- Reducing manipulation overhead of remote data-structure by controlling remote memory access order -- SONAR: Automated Communication Characterization for HPC Applications -- HPC-IODC -- HPC I/O in the Data Center Workshop -- An Overview of the Sirocco Parallel Storage System -- Analyzing Data Properties using Statistical Sampling Techniques -- Illustrated on Scientific File Formats and Compression Features -- Delta: Data Reduction for Integrated Application Workflows and Data Storage -- Investigating Read Performance of Python and NetCDF4 when using HPC Parallel Filesystems -- IWOPH -- International Workshop on OpenPOWER for HPC -- Early Application Performance at the Hartree Centre with the OpenPOWER Architecture -- Early Experiences Porting the NAMD and VMD Molecular Simulation and Analysis Software to GPU-Accelerated OpenPOWER Platforms -- Exploring Energy Efficiency for GPU-Accelerated POWER Servers -- First Experiences with ab initio Molecular Dynamics on OpenPOWER: The Case of CPMD -- High Performance Computing on the IBM Power8 platform -- Measuring and Managing Energy in OpenPOWER -- Performance Analysis of Spark/GraphX on POWER8 Cluster -- Performance of the 3D Combustion Simulation code RECOM-AIOLOS on IBM POWER8 architecture -- Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond -- IXPUG -- Application Performance on Intel Xeon Phi -- Being Prepared for KNL and Beyond -- A Comparative Study of Application Performance and Scalability on the Intel Knights Landing Processor -- Application Suitability Assessment for Many-Core Targets -- Applying the Rooine Performance Model to the Intel Xeon Phi Knights Landing Processor -- Dynamic SIMD Vector Lane Scheduling -- High performance optimizations for nuclear physics code MFDn on KNL -- Optimization of the sparse matrix-vector products of an IDR Krylov iterative solver in EMGeo for the Intel KNL manycore processor -- Optimizing a Multiple Right-hand Side Dslash Kernel for Intel Knights Corner -- Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: a Case Study on the BerkeleyGW Software -- Optimizing Wilson-Dirac operator and linear solvers for Intel KNL -- P^3MA -- First International Workshop on Performance Portable Programming Models for Accelerators Workshop -- A C++ Programming Model for Heterogeneous System Architecture -- Battling Memory Requirements of Array Programming through Streaming -- From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives -- GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models -- Porting the MPI Parallelized LES Model PALM to Multi-GPU Systems -- an Experience Report -- Software Cost Analysis of GPU-Accelerated Aeroacoustics Simulations in C++ with OpenACC -- Task-Based Cholesky Decomposition on Knights Corner using OpenMP -- Using C++ AMP to accelerate HPC applications on Multiple Platforms -- WOPSSS -- Analysis of Memory Performance: Mixed Rank Performance across Microarchitectures -- Considering I/O Processing in CloudSim for Performance and Energy Evaluation -- Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution -- Motivation and Implementation of a Dynamic Remote Storage System for I/O demanding HPC Applications -- Parallel I/O Architecture Modelling Based on File System Counters -- User-space I/O for µs – levelstoragedevices -- Scaling Spark on Lustre.This book constitutes revised selected papers from 7 workshops that were held in conjunction with the ISC High Performance 2016 conference in Frankfurt, Germany, in June 2016. The 45 papers presented in this volume were carefully reviewed and selected for inclusion in this book. They stem from the following workshops: Workshop on Exascale Multi/Many Core Computing Systems, E-MuCoCoS; Second International Workshop on Communication Architectures at Extreme Scale, ExaComm; HPC I/O in the Data Center Workshop, HPC-IODC; International Workshop on OpenPOWER for HPC, IWOPH; Workshop on the Application Performance on Intel Xeon Phi – Being Prepared for KNL and Beyond, IXPUG; Workshop on Performance and Scalability of Storage Systems, WOPSSS; and International Workshop on Performance Portable Programming Models for Accelerators, P3MA
    corecore