Search CORE

99 research outputs found

Data recovery in wormhole routing networks in hypercubes and meshes

Author: Alowayed Mohammad S.
Publication venue
Publication date: 01/12/1997
Field of study

Coherent network interfaces for fine-grain communication

Author: Falsafi Babak
Hill Mark D.
Mukherjee Shubhendu S.
Wood David A.
Publication venue
Publication date: 06/04/2009
Field of study

Using coherence can improve performance by facilitating burst transfers of whole cache blocks and reducing control overheads. This paper describes an attempt to explore network interfaces that use coherence, i.e., coherent network interfaces (CNIs), to improve communication performance. First, it reports on the development and optimization of two mechanisms that CNIs use to communicate with processors. A taxonomy and comparison of four CNIs with a more conventional NI are then presented

Infoscience - École polytechnique fédérale de Lausanne

Doctor of Philosophy

Author: Parker Michael Allen
Publication venue: University of Utah
Publication date: 01/08/2013
Field of study

dissertationHigh-performance supercomputers on the Top500 list are commonly designed around commodity CPUs. Most of the codes executed on these machines are message-passing codes using the message-passing toolkit (MPI). Thus it makes sense to look at these machines from a holistic systems architecture perspective and consider optimizations to commodity processors that make them more efficient in message-passing architectures. Described herein is a new User-Level Notification (ULN) architecture that significantly improves message-passing performance. The architecture integrates a simultaneous multithreaded (SMT) processor with a user-level network interface (NI) that can directly control the execution scheduling of threads on the processor. By allowing the network interface to control the execution of message handling code at the user level, the operating system (OS) related overhead for handling interrupts and user code dispatch related to notifications is eliminated. By using an SMT processor, message handling can be performed in one thread concurrent to user computation in other threads, thus most of the overhead of executing message handlers can be hidden. This dissertation presents measurements showing the OS overheads related to message-passing are significant in modern architectures and describes a new architecture that significantly reduces these overheads. On a communication-intensive real-world application, the ULN architecture provides a 50.9% performance improvement over a more traditional OS-based NIC and a 5.29-31.9% improvement over a best-of-class user-level NIC due to the user-level notifications

The University of Utah: J. Willard Marriott Digital Library

Mechanisms for efficient, protected messaging

Author: Lee Whay Sing, 1967-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 143-149).by Whay Sing Lee.Ph.D

CiteSeerX

DSpace@MIT

Cluster Computing Review

Author: Baker Mark
Fox Geoffrey C.
Yau Hon W.
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1995
Field of study

In the past decade there has been a dramatic shift from mainframe or ‘host−centric’ computing to a distributed ‘client−server’ approach. In the next few years this trend is likely to continue with further shifts towards ‘network−centric’ computing becoming apparent. All these trends were set in motion by the invention of the mass−reproducible microprocessor by Ted Hoff of Intel some twenty−odd years ago. The present generation of RISC microprocessors are now more than a match for mainframes in terms of cost and performance. The long−foreseen day when collections of RISC microprocessors assembled together as a parallel computer could out perform the vector supercomputers has finally arrived. Such high−performance parallel computers incorporate proprietary interconnection networks allowing low−latency, high bandwidth inter−processor communications. However, for certain types of applications such interconnect optimization is unnecessary and conventional LAN technology is sufficient. This has led to the realization that clusters of high−performance workstations can be realistically used for a variety of applications either to replace mainframes, vector supercomputers and parallel computers or to better manage already installed collections of workstations. Whilst it is clear that ‘cluster computers’ have limitations, many institutions and companies are exploring this option. Software to manage such clusters is at an early stage of development and this report reviews the current state−of−the−art. Cluster computing is a rapidly maturing technology that seems certain to play an important part in the ‘network−centric’ computing future

CiteSeerX

Syracuse University Research Facility and Collaborative Environment

Hierarchy-Aware Message-Passing in the Upcoming Many-Core Era

Author: Carsten Clauss
Simon Pickartz
Stefan Lankes
Thomas Bemmerl
Publication venue: 'IntechOpen'
Publication date: 01/01/2012
Field of study

IntechOpen

Crossref

Publikationsserver der RWTH Aachen University

Architecture independent environment for developing engineering software on MIMD computers

Author: Lopez L. A.
Valimohamed Karim A.
Publication venue
Publication date
Field of study

Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management

NASA Technical Reports Server

Scalable directoryless shared memory coherence using execution migration

Author: Cho Myong Hyon
Devadas Srinivas
Khan Omer
Lis Mieszko
Shim Keun Sup
Publication venue
Publication date: 22/11/2010
Field of study

We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architectures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance comparable to directory-based architectures without using directories: avoiding automatic data replication significantly reduces cache miss rates, while a fast network-level thread migration scheme takes advantage of shared data locality to reduce remote cache accesses that limit traditional NUCA performance. EM area and energy consumption are very competitive, and, on the average, it outperforms a directory-based MOESI baseline by 6.8% and a traditional S-NUCA design by 9.2%. We argue that with EM scaling performance has much lower cost and design complexity than in directory-based coherence and traditional NUCA architectures: by merely scaling network bandwidth from 128 to 256 (512) bit flits, the performance of our architecture improves by an additional 8% (12%), while the baselines show negligible improvement

DSpace@MIT

Investigation of hybrid message-passing and shared-memory architectures for parallel computer : a case study : turbonet

Author: Li Xi
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1995
Field of study

Several DSP (Digital Signal Processing) algorithms are developed for the MIT TurboNet parallel computer. In contrast to other parallel computers that implement exclusively in hardware either the message-passing or the shared-memory communication paradigm, or employ distributed shared-memory architectures characterized by inefficient implementation of the shared-memory paradigm, the hybrid architecture of TurboNet supports direct, efficient implementation of both paradigms. Three versions of each algorithm are developed, if possible, corresponding to message-passing, shared-memory, and hybrid communications, respectively. Theoretical and experimental comparisons of algorithms are employed in the analysis of performance. The results prove that the hybrid versions generally achieve better performance than the other two versions. The main conclusion of this research is that small-scale and medium-scale parallel computers should implement directly in hardware both communication paradigms, for high performance, robustness in relation to the application space, and ease of algorithm development. To facilitate theoretical comparisons, a methodology is developed for highly accurate prediction of algorithm performance. The success of this methodology proves that such prediction is possible for complex parallel computers, such as TurboNet, if enough information is provided by the data dependence graphs

Digital Commons @ New Jersey Institute of Technology (NJIT)