402 research outputs found

    Guaranteed bandwidth implementation of message passing interface on workstation clusters

    Get PDF
    Due to their wide availability, networks of workstations (NOW) are an attractive platform for parallel processing. Parallel programming environments such as Parallel Virtual Machine (PVM), and Message Passing Interface (MPI) offer the user a convenient way to express parallel computing and communication for a network of workstations. Currently, a number of MPI implementations are available that offer low (average ) latency and high bandwidth environments to users by utilizing an efficient MPI library specification and high speed networks. In addition to high bandwidth and low average latency requirements, mission critical distributed applications, audio/video communications require a completely different type of service, guaranteed bandwidth and worst case delays (worst case latency) to be guaranteed by underlying protocol. The hypothesis presented in this paper is that it is possible to provide an application a low level reliable transport protocol with performance and guaranteed bandwidth as close to the hardware on which it is executing. The hypothesis is proven by designing and implementing a reliable high performance message passing protocol interface which also provides the guaranteed bandwidth to MPI and to mission critical distributed MPI applications. This protocol interface works with the Fiber Distributed Data Interface (FDDI) driver which has been designed and implemented for Performance Technology Inc. commercial high performance FDDI product, the Station Management Software 7.3, and the ADI / MPICH (Argonne National Laboratory and Mississippi State University\u27s free MPI implementation)

    OASIS: a coupling software for next generation earth system modelling

    Get PDF
    In this article we present a new version of the Ocean Atmosphere Sea Ice Soil coupling software (OASIS4). With this new fully parallel OASIS4 coupler we target the needs of Earth system modelling in its full complexity. The primary focus of this article is to describe the design of the OASIS4 software and how the coupling software drives the whole coupled model system ensuring the synchronization of the different component models. The application programmer interface (API) manages the coupling exchanges between arbitrary climate component models, as well as the input and output from and to files of each individual component. The OASIS4 Transformer instance performs the parallel interpolation and transfer of the coupling data between source and target model components. As a new core technology for the software, the fully parallel search algorithm of OASIS4 is described in detail. First benchmark results are discussed with simple test configurations to demonstrate the efficiency and scalability of the software when applied to Earth system model components. Typically the compute time needed to perform the search is in the order of a few seconds and is only weakly dependant on the grid size

    A slotted-CDMA based wireless-ATM link layer : guaranteeing QoS over a wireless link.

    Get PDF
    Thesis (M.Sc.)-University of Natal, Durban, 2000.Future wireless networks will have to handle varying combinations of multimedia traffic that present the network with numerous quality of service (QoS) requirements. The continuously growing demand for mobile phones has resulted in radio spectrum becoming a precious resource that cannot be wasted. The current second-generation mobile networks are designed for voice communication and, even with the enhancements being implemented to accommodate data, they cannot efficiently handle the multimedia traffic demands that will be introduced in the near future. This thesis begins with a survey of existing wireless ATM (WATM) protocols, followed by an examination of some medium access control (MAC) protocols, supporting multimedia traffic, and based on code division multiple access (CDMA) physical layers. A WATM link layer protocol based on a CDMA physical layer, and incorporating techniques from some of the surveyed protocols, is then proposed. The MAC protocol supports a wide range of service requirements by utilising a flexible scheduling algorithm that takes advantage of the graceful degradation of CDMA with increasing user interference to schedule cells for transmission according to their maximum bit error rate (BER) requirements. The data link control (DLC) accommodates the various traffic types by allowing virtual channels (VCs) to make use of forward error correction (FEc) or retransmission techniques. The proposed link layer protocol has been implemented on a Blue Wave Systems DSP board that forms part of Alcatel Altech Telecoms' software radio platform. The details and practicality of the implementation are presented. A simulation model for the protocol has been developed using MIL3 's Opnet Modeler. Hence, both simulated and measured performance results are presented before the thesis concludes with suggestions for improvements and future work

    Optimizing Communication for Massively Parallel Processing

    Get PDF
    The current trends in high performance computing show that large machines with tens of thousands of processors will soon be readily available. The IBM Bluegene-L machine with 128k processors (which is currently being deployed) is an important step in this direction. In this scenario, it is going to be a significant burden for the programmer to manually scale his applications. This task of scaling involves addressing issues like load-imbalance and communication overhead. In this thesis, we explore several communication optimizations to help parallel applications to easily scale on a large number of processors. We also present automatic runtime techniques to relieve the programmer from the burden of optimizing communication in his applications. This thesis explores processor virtualization to improve communication performance in applications. With processor virtualization, the computation is mapped to virtual processors (VPs). After one VP has finished computation and is waiting for responses to its messages, another VP can compute, thus overlapping communication with computation. This overlap is only effective if the processor overhead of the communication operation is a small fraction of the total communication time. Fortunately, with network interfaces having co-processors, this happens to be true and processor virtualization has a natural advantage on such interconnects. The communication optimizations we present in this thesis, are motivated by applications such as NAMD (a classical molecular dynamics application) and CPAIMD (a quantum chemistry application). Applications like NAMD and CPAIMD consume a fair share of the time available on supercomputers. So, improving their performance would be of great value. We have successfully scaled NAMD to 1TF of peak performance on 3000 processors of PSC Lemieux, using the techniques presented in this thesis. We study both point-to-point communication and collective communication (specifically all-to-all communication). On a large number of processors all-to-all communication can take several milli-seconds to finish. With synchronous collectives defined in MPI, the processor idles while the collective messages are in flight. Therefore, we demonstrate an asynchronous collective communication framework, to let the CPU compute while the all-to-all messages are in flight. We also show that the best strategy for all-to-all communication depends on the message size, number of processors and other dynamic parameters. This suggests that these parameters can be observed at runtime and used to choose the optimal strategy for all-to-all communication. In this thesis, we demonstrate adaptive strategy switching for all-to-all communication. The communication optimization framework presented in this thesis, has been designed to optimize communication in the context of processor virtualization and dynamic migrating objects. We present the streaming strategy to optimize fine grained object-to-object communication. In this thesis, we motivate the need for hardware collectives, as processor based collectives can be delayed by intermediate that processors busy with computation. We explore a next generation interconnect that supports collectives in the switching hardware. We show the performance gains of hardware collectives through synthetic benchmarks

    The distributed ASCI supercomputer project

    Get PDF
    The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project
    • …
    corecore