Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model

Abstract

By acceptance of this article, the publisher recognizes that the U S . Government retains a nonexclusive royalty-free license to publish or reproduce the published form of this contribution or to allow others to do so, for U.S. Government purposes. The Los Alarnos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, make any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product., process, or service by trade name, trademark, manufacturer, or otherwise does not necessariiy constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Development and Validation of a Hierarchical Memory Model Incorporating ABSTRACT Distributed shared memory architectures (DSM's) such as the Origin 2000 are being implemented which extend the concept of single-processor cache hierarchies across an entire physically-distributed multi-processor machine. The scalability of a DSM machine is inherently tied to memory hierarchy performance, including such issues as latency hiding techniques in the architecture, global cache-coherence protocols, memory consistency models and, of course, the inherent locality of reference in algorithms of interest. In this paper, we characterize application performance with a "memory-centric" view. Using a simple mean value analysis (MVA) strategy and empirical performance data, we infer the contribution of each level in the memory system to the application's overall cycles per instruction (cpi). We account for the overlap of processor execution with memory accesses -a key parameter which is not directly measurable on the Origin systems. We infer the separate contributions of three major aichitecture features in the memory subsystem of the Origin 2000: cache size, outstanding loads-under-miss, and memory latency

    Similar works

    Full text

    thumbnail-image

    Available Versions