262 research outputs found

    Formal Semantics of a Subset of the Paderborn's BSPlib

    Full text link
    PUB (Paderborn University BSPLib) is a C library supporting the development of Bulk-Synchronous Parallel (BSP) algorithms. The BSP model allows an estimation of the execution time, avoids deadlocks and indeterminism. This paper presents a formal operational semantics for a C+PUB subset language using the Coq proof assistant and a certified N-body computation as example of using this for-mal semantics. 1

    Composition of Efficient Nested BSP Algorithms: Minimum Spanning Tree Computation as an Instructive Example

    Get PDF
    We report on the results of an automatic configuration approach for implementing complex parallel BSP algorithms. For this approach, a parallel algorithm is described by a sequence of instructions and of subproblems that have to be solved by other parallel algorithms called as subroutines, together with a mathematical description of its own running time. There also may be free algorithmic parameters as, e. g., the degree of trees in used data structures that have an impact on the running time. As the running time of an algorithm depends on several machine parameters, on some fixed and on the choice of the free algorithmic parameters and on the choice of the parallel subroutines for which the same statement applies in turn, the actual composition of the parallel program for an actual parallel machine from all these ingredients is a difficult task. We have implemented such a configuration system using the Paderborn University BSP library and present as an instructive example the theoretical and experimental results of implementations of sophisticated minimum spanning tree algorithms.

    Analisis and tools for performance prediction

    Get PDF
    We present an analytical model that extends BSP to cover both oblivious synchronization and group partitioning. There are a few oversimplifications in BSP that make difficult to have accurate predictions. Even if the numbers of individual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. Even worse, the assumption that a constant number of machine instructions takes constant time is far from the truth. Current memory hierarchies imply that memory access vary from a few cycles to several thousands. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each “communication block”. Unfortunately, to use this approach implies that the evaluation parameters not only depend on given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.Eje: Programación concurrenteRed de Universidades con Carreras en Informática (RedUNCI

    Optimal Reconfiguration of Formation Flying Spacecraft--a Decentralized Approach

    Get PDF
    This paper introduces a hierarchical, decentralized, and parallelizable method for dealing with optimization problems with many agents. It is theoretically based on a hierarchical optimization theorem that establishes the equivalence of two forms of the problem, and this idea is implemented using DMOC (Discrete Mechanics and Optimal Control). The result is a method that is scalable to certain optimization problems for large numbers of agents, whereas the usual “monolithic” approach can only deal with systems with a rather small number of degrees of freedom. The method is illustrated with the example of deployment of spacecraft, motivated by the Darwin (ESA) and Terrestrial Planet Finder (NASA) missions

    Optimal broadcast on parallel locality models

    Get PDF
    AbstractIn this paper matching upper and lower bounds for broadcast on general purpose parallel computation models that exploit network locality are proven. These models try to capture both the general purpose properties of models like the PRAM or BSP on the one hand, and to exploit network locality of special purpose models like meshes, hypercubes, etc., on the other hand. They do so by charging a cost l(|i−j|) for a communication between processors i and j, where l is a suitably chosen latency function.An upper bound T(p)=∑i=0loglogp2i·l(p1/2i) on the runtime of a broadcast on a p processor H-PRAM is given, for an arbitrary latency function l(k).The main contribution of the paper is a matching lower bound, holding for all latency functions in the range from l(k)=Ω(logk/loglogk) to l(k)=O(log2k). This is not a severe restriction since for latency functions l(k)=O(logk/log1+εlog(k)) with arbitrary ε>0, the runtime of the algorithm matches the trivial lower bound Ω(logp) and for l(k)=Θ(log1+εk) or l(k)=Θ(kε), the runtime matches the other trivial lower bound Ω(l(p)). Both upper and lower bounds apply for other parallel locality models like Y-PRAM, D-BSP and E-BSP, too

    Implementación de un digesto digital paralelo

    Get PDF
    El crecimiento de la cantidad de información que se pone a disposición en Internet a través de la Web presenta el desafío de satisfacer, en el menor tiempo posible, a los clientes que realizan búsquedas sobre esa información y a la vez mejorar el uso eficiente de los recursos. Los modelos de computación paralela permiten acercarse a este objetivo. Este trabajo presenta una solución eficiente y de bajo costo basada en el modelo de computación Bulk Synchronous Parallel, para la implementación de un Digesto Digital basado en un motor de búsquedas paralelo que utiliza bases de datos relacionales, en un entorno de acceso Web.Eje: Procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    Flexible Management on BSP Process Rescheduling: Offering Migration at Middleware and Application Levels

    Get PDF
    This article describes the rationales for developing jMigBSP - a Java programming library that offers object rescheduling. It was designed to work on grid computing environments and offers an interface that follows the BSP (Bulk Synchronous Parallel) style. jMigBSP’s main contribution focuses on the rescheduling facility in two different ways: (i) by using migration directives on the application coded irectly and (ii) through automatic load balancing at middleware level. Especially, this second idea is feasible thanks to the Java’s inheritance feature, in which transforms a simple jMigBSP application in amigratable one only by changing a single line of code. In addition, the presented library makes the object interaction easier by providing one-sided message passing directives and hides network latency through asynchronous communications. Finally, we developed three BSP applications: (i) Prefix Sum; (ii) Fractal Image Compression (FIC) and; (iii) Fast Fourier Transform (FFT).They show our library as viable solution to offer load balancing on BSP applications. Specially, the FIC results present gains up to 37% when applying migration directives inside the code. Finally, the FFT tests emphasize strength of jMigBSP. In this situation, it outperforms a native library denoted BSPlib when migration facilities take place.Keywords: Bulk Synchronous Parallel, rescheduling, Java, adaptation, object migration, grid computing

    On Dynamic Graph Partitioning and Graph Clustering using Diffusion

    Get PDF

    A calculus of functional BSP programs with projection

    Full text link
    corecore