6,642 research outputs found

    Scalability and Performance Analysis of OpenMP Codes Using the Periscope Toolkit

    Get PDF
    In this paper, we present two new approaches while rendering necessary extensions to Periscope to perform scalability and performance analysis on OpenMP codes. Periscope is an online-based performance analysis toolkit which consists of a user defined number of analysis agents that automatically search for the performance properties while the application is running. In order to detect the scalability and performance bottlenecks of OpenMP codes using Periscope, a few newly defined performance properties and meta properties are formalized. We manifest our implementation by evaluating NAS OpenMP benchmarks. As shown in our results, our approach identifies the code regions which do not scale well and other performance problems, e.g. load imbalance in NAS parallel benchmarks

    Iso-energy-efficiency: An approach to power-constrained parallel computation

    Get PDF
    Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

    An Application-Based Performance Characterization of the Columbia Supercluster

    Get PDF
    Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors

    Verification of Resilient Communication Models for the Simulation of a Highly Adaptive Energy-Efficient Computer

    Get PDF
    Delivering high performance in an energy-efficient manner is of great importance in conducting research in computational sciences and in daily use of technology. From a computing perspective, a novel concept (the HAEC Box) has been proposed that utilizes innovative ideas of optical and wireless chip-to-chip communication to allow a new level of runtime adaptivity for future computers, which is required to achieve high performance and energy efficiency. HAEC-SIM is an integrated simulation environment designed for the study of the performance and energy costs of the HAEC Box running communication-intensive applications. In this work, we conduct a verification of the implementation of three resilient communication models in HAEC-SIM. The verification involves two NAS Parallel Benchmarks and their simulated execution on a 3D torus system with 16x16x16 nodes with Infiniband links. The simulation results are consistent with those of an independent implementation. Thus, the HAEC-SIM based simulations are accurate in this regard. Delivering high performance in an energy-efficient manner is of great importance in conducting research in computational sciences and in daily use of technology. From a computing perspective, a novel concept (the HAEC Box) has been proposed that utilizes innovative ideas of optical and wireless chip-to-chip communication to allow a new level of runtime adaptivity for future computers, which is required to achieve high performance and energy efficiency. HAEC-SIM is an integrated simulation environment designed for the study of the performance and energy costs of the HAEC Box running communication-intensive applications.In this work, we conduct a verification of the implementation of three resilient communication models in HAEC-SIM. The verification involves two NAS Parallel Benchmarks and their simulated execution on a 3D torus system with 16x16x16 nodes with Infiniband links. The simulation results are consistent with those of an independent implementation.Thus, the HAEC-SIM based simulations are accurate in this regard
    corecore