Search CORE

6 research outputs found

Recommended from our members

Development and Validation of a Hierarchical Memory Model Incorporating CPU- and Memory-Operation Overlap

Author: Bassetti Federico
Lubeck Olaf M.
Luo Yong
Wasserman Harvey J.
Publication venue: Los Alamos National Laboratory
Publication date: 31/12/1997
Field of study

Distributed shared memory architectures (DSM`s) such as the Origin 2000 are being implemented which extend the concept of single-processor cache hierarchies across an entire physically-distributed multiprocessor machine. The scalability of a DSM machine is inherently tied to memory hierarchy performance, including such issues as latency hiding techniques in the architecture, global cache-coherence protocols, memory consistency models and, of course, the inherent locality of reference in algorithms of interest. In this paper, we characterize application performance with a {open_quotes}memory-centric{close_quotes} view. Using a simple mean value analysis (MVA) strategy and empirical performance data, we infer the contribution of each level in the memory system to the application`s overall cycles per instruction (cpi). We account for the overlap of processor execution with memory accesses - a key parameter which is not directly measurable on the Origin systems. We infer the separate contributions of three major architecture features in the memory subsystem of the Origin 2000: cache size, outstanding loads-under-miss, and memory latency

UNT Digital Library

Performance Evaluation of the SGI Origin2000: A Memory-Centric Characterization of LANL ASCI Applications

Author: Federico Bassetti
Harvey Wasserman
Olaf M. Lubeck
Yong Luo
Publication venue
Publication date: 01/01/1997
Field of study

: In this paper we compare single-processor performance of the SGI Origin and PowerChallenge and utilize a previously-reported performance model for hierarchical memory systems to explain the results. Both the Origin and PowerChallenge use the same microprocessor (MIPS R10000) but have significant differences in their memory subsystems. Our memory model includes the effect of overlap between CPU and memory operations and allows us to infer the individual contributions of all three improvements in the Origin's memory architecture and relate the effectiveness of each improvement to application characteristics.. 1 Introduction The biggest challenge in the design and use of high-performance computer systems involves managing the disparity between central processing unit (CPU) speed and memory subsystem speed. The need to address this issue is likely to become more acute in the future, because processor speed may double every eighteen months but DRAM memory access speed is expected to inc..

CiteSeerX

DSpace at NTUA

UNT Digital Library

Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model

Author: Federico Bassetti
Harvey J Wasserman
Olaf M Lubeck
Yong Luo
Publication venue
Publication date: 01/01/1998
Field of study

By acceptance of this article, the publisher recognizes that the U S . Government retains a nonexclusive royalty-free license to publish or reproduce the published form of this contribution or to allow others to do so, for U.S. Government purposes. The Los Alarnos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, make any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product., process, or service by trade name, trademark, manufacturer, or otherwise does not necessariiy constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Development and Validation of a Hierarchical Memory Model Incorporating ABSTRACT Distributed shared memory architectures (DSM's) such as the Origin 2000 are being implemented which extend the concept of single-processor cache hierarchies across an entire physically-distributed multi-processor machine. The scalability of a DSM machine is inherently tied to memory hierarchy performance, including such issues as latency hiding techniques in the architecture, global cache-coherence protocols, memory consistency models and, of course, the inherent locality of reference in algorithms of interest. In this paper, we characterize application performance with a "memory-centric" view. Using a simple mean value analysis (MVA) strategy and empirical performance data, we infer the contribution of each level in the memory system to the application's overall cycles per instruction (cpi). We account for the overlap of processor execution with memory accesses -a key parameter which is not directly measurable on the Origin systems. We infer the separate contributions of three major aichitecture features in the memory subsystem of the Origin 2000: cache size, outstanding loads-under-miss, and memory latency

CiteSeerX

The Performance Realities Of Massively Parallel Processors: A Case Study

Author: Harvey J. Wasserman
Harvey J. Wasserman
Margaret L. Simmons
Margaret L. Simmons
Olaf M. Lubeck
Publication venue
Publication date
Field of study

We present the results of an architectural comparison of SIMD massive parallelism, as implemented in the Thinking Machines Corp. CM-2, and vector or concurrent-vector processing, as implemented in the Cray Research Inc. YMP /8. The comparison is based primarily upon three application codes taken from the LANL CM-2 workload. Tests were run by porting CM Fortran codes to the Y-MP, so that nearly the same level of optimization was obtained on both machines. The results for fully-configured systems, using measured data rather than scaled data from smaller configurations, show that the Y-MP/8 is faster than the 64k CM-2 for all three codes. A simple model that accounts for the relative characteristic computational speeds of the two machines, and reduction in overall CM-2 performance due to communication or SIMD conditional execution, accurately predicts the performance of two of the three codes. Other factors, such as memory bandwidth and compiler effects, are also discussed. Finally, the p..

CiteSeerX

A performance comparison of four supercomputers

Author: Brooks J.
Christopher Eoyang
Dent D.
Eoyang C
Harvey J. Wasserman
Hiroo Harada
Margaret L. Simmons
Misako Ishiguro
Olaf M. Lubeck
Raul Mendez
Uchida N.
Watanabe T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref