Search CORE

15 research outputs found

Binary-swap volumetric rendering on the T3D

Author: de Verdiere G. Colin
Hansen Charles D.
Publication venue: Cray User Group (CUG)
Publication date: 01/01/1995
Field of study

Journal ArticleThis paper presents a data distributed parallel raytraced volume rendering algorithm and its implementation on the CRI T3D. This algorithm distributes the data and the computational load to individual processing units to achieve fast and high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local raytracing of their subvolume concurrently. No communication between processing units is needed during this local ray-tracing process. A subimages is generated by each processing unit and the final image is obtained by compositing subimages in the proper order by the Binary-Swap algorithm. Performance of this algorithm on the T3D is presented and compared to an implementation on the CM-5

The University of Utah: J. Willard Marriott Digital Library

Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

Author: Andrew Chien
Mario Lauria
Scott Pakin
Publication venue
Publication date: 01/01/1999
Field of study

We describe our experience of designing, implementing, and evaluating two generations of high performance communication libraries, Fast Messages (FM) for Myrinet. In FM 1, we designed a simple interface and provided guarantees of reliable and in-order delivery, and flow control. While this was a significant improvement over previous systems, it was not enough. Layering MPI atop FM 1 showed that only about 35 % of the FM 1 bandwidth could be delivered to higher level communication APIs. Our second generation communication layer, FM 2, addresses the identified problems, providing gather-scatter, interlayer scheduling, receiver flow control, as well as some convenient API features which simplify programming. FM 2 can deliver 55–95 % to higher level APIs such as MPI. This is especially impressive as the absolute bandwidths delivered have increased over fourfold to 90 MB/s. We describe general issues encountered in matching two communication layers, and our solutions as embodied in FM 2

CiteSeerX

Coherent network interfaces for fine-grain communication

Author: Falsafi Babak
Hill Mark D.
Mukherjee Shubhendu S.
Wood David A.
Publication venue
Publication date: 06/04/2009
Field of study

Using coherence can improve performance by facilitating burst transfers of whole cache blocks and reducing control overheads. This paper describes an attempt to explore network interfaces that use coherence, i.e., coherent network interfaces (CNIs), to improve communication performance. First, it reports on the development and optimization of two mechanisms that CNIs use to communicate with processors. A taxonomy and comparison of four CNIs with a more conventional NI are then presented

Infoscience - École polytechnique fédérale de Lausanne

Data recovery in wormhole routing networks in hypercubes and meshes

Author: Alowayed Mohammad S.
Publication venue
Publication date: 01/12/1997
Field of study

SHAREOK repository

Mechanisms for efficient, protected messaging

Author: Lee Whay Sing, 1967-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 143-149).by Whay Sing Lee.Ph.D

CiteSeerX

DSpace@MIT

Human factors in the design of parallel program performance tuning tools

Author: Hondroudakis Anna
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Doctor of Philosophy

Author: Parker Michael Allen
Publication venue: University of Utah
Publication date: 01/08/2013
Field of study

dissertationHigh-performance supercomputers on the Top500 list are commonly designed around commodity CPUs. Most of the codes executed on these machines are message-passing codes using the message-passing toolkit (MPI). Thus it makes sense to look at these machines from a holistic systems architecture perspective and consider optimizations to commodity processors that make them more efficient in message-passing architectures. Described herein is a new User-Level Notification (ULN) architecture that significantly improves message-passing performance. The architecture integrates a simultaneous multithreaded (SMT) processor with a user-level network interface (NI) that can directly control the execution scheduling of threads on the processor. By allowing the network interface to control the execution of message handling code at the user level, the operating system (OS) related overhead for handling interrupts and user code dispatch related to notifications is eliminated. By using an SMT processor, message handling can be performed in one thread concurrent to user computation in other threads, thus most of the overhead of executing message handlers can be hidden. This dissertation presents measurements showing the OS overheads related to message-passing are significant in modern architectures and describes a new architecture that significantly reduces these overheads. On a communication-intensive real-world application, the ULN architecture provides a 50.9% performance improvement over a more traditional OS-based NIC and a 5.29-31.9% improvement over a best-of-class user-level NIC due to the user-level notifications

The University of Utah: J. Willard Marriott Digital Library

High performance computing and communications: FY 1995 implementation plan

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations

Author: Dimitrov Rossen Petkov
Publication venue: Scholars Junction
Publication date: 12/05/2001
Field of study

This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered

Scholars Junction - Mississippi State University Institutional Repository