6 research outputs found

    Performance Evaluation of the SGI Origin2000: A Memory-Centric Characterization of LANL ASCI Applications

    No full text
    : In this paper we compare single-processor performance of the SGI Origin and PowerChallenge and utilize a previously-reported performance model for hierarchical memory systems to explain the results. Both the Origin and PowerChallenge use the same microprocessor (MIPS R10000) but have significant differences in their memory subsystems. Our memory model includes the effect of overlap between CPU and memory operations and allows us to infer the individual contributions of all three improvements in the Origin's memory architecture and relate the effectiveness of each improvement to application characteristics.. 1 Introduction The biggest challenge in the design and use of high-performance computer systems involves managing the disparity between central processing unit (CPU) speed and memory subsystem speed. The need to address this issue is likely to become more acute in the future, because processor speed may double every eighteen months but DRAM memory access speed is expected to inc..

    Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model

    No full text
    By acceptance of this article, the publisher recognizes that the U S . Government retains a nonexclusive royalty-free license to publish or reproduce the published form of this contribution or to allow others to do so, for U.S. Government purposes. The Los Alarnos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, make any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product., process, or service by trade name, trademark, manufacturer, or otherwise does not necessariiy constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Development and Validation of a Hierarchical Memory Model Incorporating ABSTRACT Distributed shared memory architectures (DSM's) such as the Origin 2000 are being implemented which extend the concept of single-processor cache hierarchies across an entire physically-distributed multi-processor machine. The scalability of a DSM machine is inherently tied to memory hierarchy performance, including such issues as latency hiding techniques in the architecture, global cache-coherence protocols, memory consistency models and, of course, the inherent locality of reference in algorithms of interest. In this paper, we characterize application performance with a "memory-centric" view. Using a simple mean value analysis (MVA) strategy and empirical performance data, we infer the contribution of each level in the memory system to the application's overall cycles per instruction (cpi). We account for the overlap of processor execution with memory accesses -a key parameter which is not directly measurable on the Origin systems. We infer the separate contributions of three major aichitecture features in the memory subsystem of the Origin 2000: cache size, outstanding loads-under-miss, and memory latency

    The Performance Realities Of Massively Parallel Processors: A Case Study

    No full text
    We present the results of an architectural comparison of SIMD massive parallelism, as implemented in the Thinking Machines Corp. CM-2, and vector or concurrent-vector processing, as implemented in the Cray Research Inc. YMP /8. The comparison is based primarily upon three application codes taken from the LANL CM-2 workload. Tests were run by porting CM Fortran codes to the Y-MP, so that nearly the same level of optimization was obtained on both machines. The results for fully-configured systems, using measured data rather than scaled data from smaller configurations, show that the Y-MP/8 is faster than the 64k CM-2 for all three codes. A simple model that accounts for the relative characteristic computational speeds of the two machines, and reduction in overall CM-2 performance due to communication or SIMD conditional execution, accurately predicts the performance of two of the three codes. Other factors, such as memory bandwidth and compiler effects, are also discussed. Finally, the p..
    corecore