130 research outputs found

    Towards dynamic threading support for OpenMP

    Get PDF

    Remora : implementing adaptive parallelism on a heterogeneous cluster of networked workstations

    Get PDF
    Computers connected to a local area network are often only fully utilized for short periods of time. In fact, most workstations are not used at all for a significant portion of the day. The combined "idle time" of the workstations on a network constitutes a significant computing resource, which is generally wasted. If harnessed properly, such a resource could constitute a cheap alternative to expensive high-performance computers. Adaptive parallelism refers to the parallel execution of a computation on a dynamically changing set of processors. This thesis investigates the viability of this approach as a vehicle to harness the "idle cycles" available on a heterogeneous cluster of networked computers. A system, called Remora, which implements adaptive parallelism via the Linda programming paradigm, is presented. Experiments, performed using Remora, show that adaptive parallelism provides an efficient vehicle for using idle processor cycles, without having an adverse effect on the tasks which constitute the normal workload of the computers being used

    Transparent Adaptive Parallelism on NOWs using OpenMP

    Get PDF
    We present a system that allows OpenMP programs to execute on a network of workstations with a variable number of nodes. The ability to adapt to a variable number of nodes allows a program to take advantage of additional nodes that become available after it starts execution, or to gracefully scale down when the number of available nodes is reduced. We demonstrate that the cost of adaptation is modest; the system allows a program to adapt at a moderate rate without much performance loss.Two ideas underlie the efficiency of our design. First, we recognize that OpenMP programs exhibit convenient adaptation points during their execution, points at which the cost of adaptation can be much reduced. Second, by allowing a process a certain grace period before it must leave a node, we insure that most adaptations can occur at these adaptation points, and thus at low cost. Migration of a process, a much more expensive method for providing adaptivity, is used only as a back-up solution, when the process cannot reach an adaptation point within the grace period.Our implementation consists of an OpenMP pre-processor that generates TreadMarks distributed shared memory (DSM) programs, and a version of TreadMarks modified to adapt to a variable number of nodes. Using a DSM as the underlying substrate facilitates the data (re-)distribution necessary after an adaptation

    Cooperating runtime systems in LiPS

    Get PDF
    Performing computation using networks of workstations is increasingly becoming an alternative to using a supercomputer. This approach is motivated by the vast quantities of unused idle-time available in workstation networks. Unlike comptuting o a tighty coupled parallel computer, where a fixed number of processor nodes is used within a computation, the number of usable nodes in a workstation network is constantly changing over time. Additionally, workstations are more frequently subject to outages, e.g. due to reboots. The question arises how applications, adapting smoothly to this environment, should be realized. LiPS is a system for distributed computing using idle-cycles in networks for workstations. This system is ints version 2.3 is currently used at the Universität des Saarlandes in Saarbrücken, Germany to perform computationally intensive applications in the field of cryptography on a net of approximately 250 workstations and should be enhanced to work within an environment of more than 1000 machines all over the world within the next years. In this paper we present the runtime systems of LiPS along with performance measurements taken with the current LiPS development version 2.4

    An Evaluation of Adaptive Execution of OpenMP Task Parallel Programs

    Get PDF
    We present a system that allows task parallel OpenMP pro grams to execute on a network of workstations (NOW) with a variable number of nodes Such adaptivity, generally called adaptive parallelism, is important in a multi-user NOW environment, enabling the system to expand the computation onto idle nodes or withdraw from otherwise occupied nodes. We focus on task parallel applications in this paper, but the system also lets data parallel applications run adaptively. When an adaptation is requested, we let all processes complete theircurrent tasks, then the system executes an extra OpenMP join-fork sequence not present in the application code. Here, the system can change the number of nodes without involving the application, as processes do not have a compute-relevant private process state. We show that the costs of adaptations is low, and we explain why the costs are lower for task parallel applications than for data parallel applications

    Best of both latency and throughput

    Get PDF
    Abstrac

    Graphical User Interface to Monitor and Manage the DDAS System Performance

    Get PDF

    Data Parallel Programming in an Adaptive Environment

    Get PDF
    For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Executing data parallel programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers to generate code for an adaptive environment. We also present performance results for a multiblock Navier-Stokes solver run on a network of workstations using PVM for message passing. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computations. (Also cross-referenced as UMIACS-TR-94-109
    • …
    corecore