12 research outputs found

    On the Effects of Memory Latency and Bandwidth on Supercomputer Application Performance

    Full text link
    Since the first vector supercomputers in the mid-1970’s, the largest scale applications have traditionally been float-ing point oriented numerical codes, which can be broadly characterized as the simulation of physics on a computer. Supercomputer architectures have evolved to meet the needs of those applications. Specifically, the computational work of the application tends to be floating point oriented, and the decomposition of the problem two or three dimensional. To-day, an emerging class of critical applications may change those assumptions: they are combinatorial in nature, in-teger oriented, and irregular. The performance of both classes of applications is dominated by the performance of the memory system. This paper compares the memory per-formance sensitivity of both traditional and emerging HPC applications, and shows that the new codes are significantly more sensitive to memory latency and bandwidth than their traditional counterparts. Additionally, these codes exhibit lower base-line performance, which only exacerbates the problem. As a result, the construction of future supercom-puter architectures to support these applications will most likely be different from those used to support traditional codes. Quantitatively understanding the difference between the two workloads will form the basis for future design choices. 1

    A detailed comparison of two transaction processing workloads

    Full text link

    High speed simulation of microprocessor systems using LTU dynamic binary translation

    Get PDF
    This thesis presents new simulation techniques designed to speed up the simulation of microprocessor systems. The advanced simulation techniques may be applied to the simulator class which employs dynamic binary translation as its underlying technology. This research supports the hypothesis that faster simulation speeds can be realized by translating larger sections of the target program at runtime. The primary motivation for this research was to help facilitate comprehensive design-space exploration and hardware/software co-design of novel processor architectures by reducing the time required to run simulations. Instruction set simulators are used to design and to verify new system architectures, and to develop software in parallel with hardware. However, compromises must often be made when performing these tasks due to time constraints. This is particularly true in the embedded systems domain where there is a short time-to-market. The processing demands placed on simulation platforms are exacerbated further by the need to simulate the increasingly complex, multi-core processors of tomorrow. High speed simulators are therefore essential to reducing the time required to design and test advanced microprocessors, enabling new systems to be released ahead of the competition. Dynamic binary translation based simulators typically translate small sections of the target program at runtime. This research considers the translation of larger units of code in order to increase simulation speed. The new simulation techniques identify large sections of program code suitable for translation after analyzing a profile of the target program’s execution path built-up during simulation. The average instruction level simulation speed for the EEMBC benchmark suite is shown to be at least 63% faster for the new simulation techniques than for basic block dynamic binary translation based simulation and 14.8 times faster than interpretive simulation. The average cycle-approximate simulation speed is shown to be at least 32% faster for the new simulation techniques than for basic block dynamic binary translation based simulation and 8.37 times faster than cycle-accurate interpretive simulation

    Empirical Performance Analysis of HPC Applications with Portable Hardware Counter Metrics

    Get PDF
    In this dissertation, we demonstrate that it is possible to develop methods of empirical hardware-counter-based performance analysis for scientific applications running on diverse CPUs. Although counters have been used in performance analysis for over 30 years, the methods remain limited to particular vendors or generations of CPUs. Our hypothesis is that counter-based measurements could be developed to provide consistent performance information on diverse CPUs. We prove the hypothesis correct by demonstrating one such set of metrics. We begin with an introduction and background discussing empirical performance analysis on CPUs. The background includes the Roofline Performance Model which is widely used to visualize the performance of scientific applications relative to the potential system performance. This model uses metrics that are portable to different CPU architectures, making it a useful starting point for efforts to develop portable hardware counter metrics. We contribute to existing roofline literature by presenting a method using counters to measure the required application data on two CPUs and by presenting benchmarks to produce the Roofline Model of the CPU. These contributions are complementary since the benchmarks can be used to validate the hardware counters used to measure the application data. We present a set of performance metrics derived from Hardware Performance Monitors that we have been able to replicate on CPUs from two vendors. We developed these metrics to focus on information that can inform developers about the performance of algorithms and data structures in applications. This method contrasts with other methods which are aimed at microarchitectural features and allows users to understand application performance from the same perspective on multiple CPUs. We use a series of case studies to explore the usefulness of our metrics and to validate that the measured values provide the expected information. The first set of studies examines benchmarks and mini-applications with a variety of performance. Finally, we study the performance of several versions of a scientific application using the Roofline Model and the new metrics. These case studies show that our performance metrics can provide performance information on two CPUs, proving our hypothesis by example. This dissertation includes previously published co-author material

    Fifth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Fifth Goddard Conference on Mass Storage Systems and Technologies held September 17 - 19, 1996, at the University of Maryland, University Conference Center in College Park, Maryland. As one of an ongoing series, this conference continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include storage architecture, database management, data distribution, file system performance and modeling, and optical recording technology. There will also be a paper on Application Programming Interfaces (API) for a Physical Volume Repository (PVR) defined in Version 5 of the Institute of Electrical and Electronics Engineers (IEEE) Reference Model (RM). In addition, there are papers on specific archives and storage products

    Characterization of alpha AXP performance using TP and SPEC workloads

    No full text

    Third International Symposium on Space Mission Operations and Ground Data Systems, part 1

    Get PDF
    Under the theme of 'Opportunities in Ground Data Systems for High Efficiency Operations of Space Missions,' the SpaceOps '94 symposium included presentations of more than 150 technical papers spanning five topic areas: Mission Management, Operations, Data Management, System Development, and Systems Engineering. The papers focus on improvements in the efficiency, effectiveness, productivity, and quality of data acquisition, ground systems, and mission operations. New technology, techniques, methods, and human systems are discussed. Accomplishments are also reported in the application of information systems to improve data retrieval, reporting, and archiving; the management of human factors; the use of telescience and teleoperations; and the design and implementation of logistics support for mission operations

    Third International Symposium on Space Mission Operations and Ground Data Systems, part 2

    Get PDF
    Under the theme of 'Opportunities in Ground Data Systems for High Efficiency Operations of Space Missions,' the SpaceOps '94 symposium included presentations of more than 150 technical papers spanning five topic areas: Mission Management, Operations, Data Management, System Development, and Systems Engineering. The symposium papers focus on improvements in the efficiency, effectiveness, and quality of data acquisition, ground systems, and mission operations. New technology, methods, and human systems are discussed. Accomplishments are also reported in the application of information systems to improve data retrieval, reporting, and archiving; the management of human factors; the use of telescience and teleoperations; and the design and implementation of logistics support for mission operations. This volume covers expert systems, systems development tools and approaches, and systems engineering issues
    corecore