Search CORE

105 research outputs found

Wire management for coherence traffic in chip multiprocessors

Author: Balasubramonian Rajeev
Cheng Liqun
Publication venue: Workshop on Complexity-Effective Design
Publication date: 01/01/2005
Field of study

Journal ArticleImprovements in semiconductor technology have made it possible to include multiple processor cores on a single die. Chip Multi-Processors (CMP) are an attractive choice for future billion transistor architectures due to their low design complexity, high clock frequency, and high throughput. In a typical CMP architecture, the L2 cache is shared by multiple cores and data coherence is maintained among private L1s. Coherence operations entail frequent communication over global on-chip wires. In future technologies, communication between different L1s will have a significant impact on overall processor performance and power consumption. On-chip wires can be designed to have different latency, bandwidth, and energy properties. Likewise, coherence protocol messages have different latency and bandwidth needs. We propose an interconnect comprised of wires with varying latency, bandwidth, and energy characteristics, and advocate intelligently mapping coherence operations to the appropriate wires. In this paper, we present a comprehensive list of techniques that allow coherence protocols to exploit a heterogeneous interconnect and present preliminary data that indicates the potential of these techniques to significantly improve performance and reduce power consumption. We further demonstrate that most of these techniques can be implemented at a minimum complexity overhead

The University of Utah: J. Willard Marriott Digital Library

SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs) (Preliminary Version)

Author: Bader D.A.
Publication venue: UNM Digital Repository
Publication date: 01/11/1998
Field of study

Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Author: Culler D. E.
Galles M.
Ho R.
John B. Carter
Karthik Ramani
Kongetira P.
Krewell K.
Liqun Cheng
Naveen Muralimanohar
Nelson N.
Rajeev Balasubramonian
Tendler J.
{13} Corporate Institute of Electrical and Electronics Engineers
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Adding Token Counting to Directory-Based Cache Coherence

Author: Blundell Colin
Martin Milo M.K.
Raghavan Arun
Publication venue: ScholarlyCommons
Publication date: 04/06/2008
Field of study

The coherence protocol is a first-order design concern in multicore designs. Directory protocols are naturally scalable, as they place no restrictions on the interconnect and have minimal bandwidth requirements; however, this scalability comes at the cost of increased sharing latency due to indirection. In contrast, broadcast-based systems such as snooping protocols and token coherence reduce latency of sharing misses by sending requests directly to other processors. Unfortunately, their reliance on totally ordered interconnects and/or broadcast limits their scalability. This work introduces PATCH (Predictive/Adaptive Token Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically using available bandwidth to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a novel mechanism called token tenure uses local processor timeouts and the directory’s per-block point of ordering at the home node to guarantee forward progress without relying on broadcast. PATCH makes three main contributions. First, PATCH uses direct request prioritization to match the performance of broadcast-based protocols without restricting scalability. Second, PATCH introduces token tenure, which provides broadcast-free forward progress for token counting protocols. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in betwee

CiteSeerX

ScholarlyCommons@Penn

Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations

Author: Dimitrov Rossen Petkov
Publication venue: Scholars Junction
Publication date: 12/05/2001
Field of study

This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered

Scholars Junction - Mississippi State University Institutional Repository

HPCCP/CAS Workshop Proceedings 1998

Author: Mata Ellen
Schulbach Catherine
Schulbach Catherine
Publication venue
Publication date
Field of study

This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

NASA Technical Reports Server

SwitchWare: Accelerating Network Evolution (White Paper)

Author: Farber David J
Feldmeier D. C
Gunter Carl A
Nettles Scott M
Sincoskie W. David
Smith Jonathan M
Publication venue: ScholarlyCommons
Publication date: 01/01/1996
Field of study

We propose the development of a set of software technologies ( SwitchWare ) which will enable rapid development and deployment of new network services. The key insight is that by making the basic network service selectable on a per user (or even per packet) basis, the need for formal standardization is eliminated. Additionally, by making the basic network service programmable, the deployment times, today constrained by capital funding limitations, are tremendously reduced (to the order of software distribution times). Finally, by constructing an advanced, robust programming environment, even the service development time can be reduced. A SwitchWare switch consists of input and output ports controlled by a software-programmable element; programs are contained in sequences of messages sent to the SwitchWare switch\u27s input ports, which interpret the messages as programs. We call these Switchlets . This accelerates the pace of network evolution, as evolving user needs can be immediately reflected in the network infrastructure. Immediate reconfigurability enhances the adaptability of the network infrastructure in the face of unexpected situations. We call a network built from SwitchWare switches an active network

ScholarlyCommons@Penn

Recommended from our members

Scalability of preconditioners as a strategy for parallel computation of compressible fluid flow

Author: Hansen Glen A.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/05/1996
Field of study

Parallel implementations of a Newton-Krylov-Schwarz algorithm are used to solve a model problem representing low Mach number compressible fluid flow over a backward-facing step. The Mach number is specifically selected to result in a numerically {open_quote}stiff{close_quotes} matrix problem, based on an implicit finite volume discretization of the compressible 2D Navier-Stokes/energy equations using primitive variables. Newton`s method is used to linearize the discrete system, and a preconditioned Krylov projection technique is used to solve the resulting linear system. Domain decomposition enables the development of a global preconditioner via the parallel construction of contributions derived from subdomains. Formation of the global preconditioner is based upon additive and multiplicative Schwarz algorithms, with and without subdomain overlap. The degree of parallelism of this technique is further enhanced with the use of a matrix-free approximation for the Jacobian used in the Krylov technique (in this case, GMRES(k)). Of paramount interest to this study is the implementation and optimization of these techniques on parallel shared-memory hardware, namely the Cray C90 and SGI Challenge architectures. These architectures were chosen as representative and commonly available to researchers interested in the solution of problems of this type. The Newton-Krylov-Schwarz solution technique is increasingly being investigated for computational fluid dynamics (CFD) applications due to the advantages of full coupling of all variables and equations, rapid non-linear convergence, and moderate memory requirements. A parallel version of this method that scales effectively on the above architectures would be extremely attractive to practitioners, resulting in efficient, cost-effective, parallel solutions exhibiting the benefits of the solution technique

UNT Digital Library

Scalability of preconditioners as a strategy for parallel computation of compressible fluid flow

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref