Search CORE

412 research outputs found

Software Tool Evaluation Methodology

Author: Hariri Salim
Park Sung Yong
Reddy Rajashekar
Subramanyan Mahesh
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1995
Field of study

The recent development of parallel and distributed computing software has introduced a variety of software tools that support several programming paradigms and languages. This variety of tools makes the selection of the best tool to run a given class of applications on a parallel or distributed system a non-trivial task that requires some investigation. We expect tool evaluation to receive more attention as the deployment and usage of distributed systems increases. In this paper, we present a multi-level evaluation methodology for parallel/distributed tools in which tools are evaluated from different perspectives. We apply our evaluation methodology to three message passing tools viz Express, p4, and PVM. The approach covers several important distributed systems platforms consisting of different computers (e.g., IBM-SP1, Alpha cluster, SUN workstations) interconnected by different types of networks (e.g., Ethernet, FDDI, ATM)

Syracuse University Research Facility and Collaborative Environment

A Problem Solving Environment for Network Computing

Author: Furmanski Wojtek
Hariri Salim
Kim Dongmin
Kim Yoonhee
Topcuoglu Haluk
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1998
Field of study

The current advances in high-speed networks and WWW technologies have made network computing a cost-effective high performance computing environment. New software development models and problem solving environments must be developed to utilize the network computing environment efficiently. In this paper we present Virtual Distributed Computing Environment (VDCE), which provides a problem solving environment for high-performance distributed computing over wide-area networks. VDCE enables scientists to develop distributed applications without knowing the detailed architecture of the underlying resources. VDCE provides well-defined library functions that relieve end users from tedious task implementations and it supports software reusability. The VDCE software architecture consists of two modules: Application Editor, and VDCE Runtime System. Application Editor is a Web-based graphical user interface that helps user to develop network applications and specifies the computing and communication properties of each task within the applications. The VDCE Runtime System schedules the individual tasks of the application to the best available resources, runs, and manages the application execution on the assigned resources. We also present how VDCE can be used as a problem solving environment and how the users can experiment and evaluate the performance of their applications for different VDCE hardware and/or software configurations

Syracuse University Research Facility and Collaborative Environment

SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs) (Preliminary Version)

Author: Bader D.A.
Publication venue: UNM Digital Repository
Publication date: 01/11/1998
Field of study

Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations

Author: Dimitrov Rossen Petkov
Publication venue: Scholars Junction
Publication date: 12/05/2001
Field of study

This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered

Scholars Junction - Mississippi State University Institutional Repository

Quantifying the Performance Differences Between PVM and TreadMarks

Author: Cox A.L.
Dwarkadas S.
Lu H.
Zwaenepoel W.
Publication venue: 'Elsevier BV'
Publication date: 17/10/2005
Field of study

We compare two systems for parallel programming on networks of workstations: Parallel Virtual Machine (PVM) a message passing system, and TreadMarks, a software distributed shared memory (DSM) system. We present results for eight applications that were implemented using both systems. The programs are Water and Barnes-Hut from the SPLASH benchmark suite; 3-D FFT, Integer Sort (IS) and Embarrassingly Parallel (EP) from the NAS benchmarks; ILINK, a widely used genetic linkage analysis program; and Successive Over-Relaxation (SOR) and Traveling Salesman (TSP). Two different input data sets were used for five of the applications. We use two execution environments. The first is an 155 Mbps ATM network with eight Sparc-20 model 61 workstations; the second is an eight processor IBM SP/2. The differences in speedup between TreadMarks and PVM are dependent on the application, and, only to much a lesser extent, on the platform and the data set used. In particular, the TreadMarks speedup for six of the eight applications is within 15% of that achieved with PVM. For one application, the difference in speedup is between 15% and 30%, and for one application, the difference is around 50%. More important than the actual differences in speedups, we investigate the causes behind these differences. The cost of sending and receiving messages on current networks of workstations is very high, and previous work has identified communication costs as the primary source of overhead in software DSM implementations. The observed performance differences between PVM and TreadMarks are therefore primarily a result of differences in the amount of communication between the two systems. We identified four factors that contribute to the larger amount of communication in TreadMarks:1) extra messages due to the separation of synchronization and data transfer, 2) extra messages to handle access misses caused by the use of an invalidate protocol, 3) false sharing, and 4) d iff accumulation for migratory data. We have quantified the effect of the last three factors by measuring the performance gain when each is eliminated. Because the separation of synchronization and data transfer is a fundamental characteristic of the shared memory model, there is no way to measure its contribution to performance without completely deviating from the shared memory model. Of the three remaining factors, TreadMarks’ inability to send data belonging to different pages in a single message is the most important. The effect of false sharing is quite limited. Reducing diff accumulation benefits migratory data only when the diffs completely overlap. When these performance impediments are removed, all of the TreadMarks programs perform within 25% of PVM, and for six out of eight experiments, TreadMarks is less than 5% slower than PVM

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

tCHARM : distributed concurrent processing using multithreading

Author: Ng Avis
Publication venue: 'Oregon State University'
Publication date
Field of study

CHARM is a parallel programming language that was originally implemented for a network of workstations each of which has only one processor. In this project, we ported CHARM for a network of workstations each of which has more than one processor (multi-computer) using multithreading to exploit the multiple processors. Network performance is another issue addressed in the project. Since the messaging protocol called Fast Messages over Myrinet offers low latency and high bandwidth, we ported the messaging protocol from the original TCP/IP to Fast Messages over Myrinet

ScholarsArchive@OSU

The distributed ASCI supercomputer project

The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project

VU Research Portal

Pure OAI Repository

International Migration, Integration and Social Cohesion online publications

Scalable Parallel Computers for Real-Time Signal Processing

Author: Hwang K
Xu Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

We assess the state-of-the-art technology in massively parallel processors (MPPs) and their variations in different architectural platforms. Architectural and programming issues are identified in using MPPs for time-critical applications such as adaptive radar signal processing. We review the enabling technologies. These include high-performance CPU chips and system interconnects, distributed memory architectures, and various latency hiding mechanisms. We characterize the concept of scalability in three areas: resources, applications, and technology. Scalable performance attributes are analytically defined. Then we compare MPPs with symmetric multiprocessors (SMPs) and clusters of workstations (COWs). The purpose is to reveal their capabilities, limits, and effectiveness in signal processing. We evaluate the IBM SP2 at MHPCC, the Intel Paragon at SDSC, the Gray T3D at Gray Eagan Center, and the Gray T3E and ASCI TeraFLOP system proposed by Intel. On the software and programming side, we evaluate existing parallel programming environments, including the models, languages, compilers, software tools, and operating systems. Some guidelines for program parallelization are provided. We examine data-parallel, shared-variable, message-passing, and implicit programming models. Communication functions and their performance overhead are discussed. Available software tools and communication libraries are also introducedpublished_or_final_versio

HKU Scholars Hub

Recommended from our members

Automatic scheduling and dynamic load sharing of parallel computations on heterogeneous workstation clusters

Author: Jacob Joseph, 1971-
Publication venue: 'Oregon State University'
Publication date
Field of study

Parallel computing on heterogeneous workstation clusters has proved to be a very efficient use of available resources, increasing their overall utilization. However, for it to be a viable alternative to expensive, dedicated parallel machines, a number of key issues need to be resolved. One of the major challenges of heterogeneous computing is coping with the inherent heterogeneity of the system, with the availability of workstations from different vendors of varying processing speeds and capabilities. The existence of multiple jobs and users further complicates the task. The time taken for a parallel job is constrained by the time taken by the slowest or the most heavily loaded workstation. Therefore, load sharing of parallel computations is imperative in ensuring good overall utilization of the system. Since load sharing is essentially independent of the particular parallel job being run, the development of program independent, automatic, scheduling and load sharing strategies have become vital to the efficient use of the heterogeneous cluster. This thesis discusses various prior approaches to load sharing, examines a new strategy developed for heterogeneous workstations, and evaluates its performance

ScholarsArchive@OSU

An Evaluation of Software Release-Consistent Protocols

Author: Cox A.L.
Dwarkadas S.
Keleher P.
Zwaenepoel W
Publication venue: 'Elsevier BV'
Publication date: 17/10/2005
Field of study

This paper presents an evaluation of three software implementations of release consistency. Release consistent protocols allow data communication to be aggregated, and multiple writers to simultaneously modify a single page. We evaluated an eager invalidate protocol that enforces consistency when synchronization variables are released, a lazy invalidate protocol that enforces consistency when synchronization variables are acquired, and a lazy hybrid protocol that selectively uses update to reduce access misses. Our evaluation is based on implementations running on DECstation-5000/240s connected by an ATM LAN, and an execution driven simulator that allows us to vary network parameters. Our results show that the lazy protocols consistently outperform the eager protocol for all but one application, and that the lazy hybrid performs the best overall. However, the relative performance of the implementations is highly dependent on the relative speeds of the network, processor, and communication software. Lower bandwidths and high per byte software communication costs favor the lazy invalidate protocol, while high bandwidths and low per byte costs favor the hybrid. Performance of the eager protocol approaches that of the lazy protocols only when communication becomes essentially free

Infoscience - École polytechnique fédérale de Lausanne