16 research outputs found

    Solving parallel problems by OTMP model

    Get PDF
    Since the early stages of parallel computing, one of the most common solutions to in troduce parallelism has been to extend a sequential language with some sort of parallel version of the for construct, commonly denoted as forall construct. Although similar syntax, these forall loops di er in their semantics and implementations. The High Performance Fortran (HPF) and OpenMP versions are, likely, among the most popular. This paper presents yet another forall loop extension for the C language. In this work, we introduce a parallel computation model: One Thread Multiple Processor Model (OTMP). This model proposes an abstract machine, a programming model and cost model. The programming model de nes another forall loop construct, the theorical machine aims for both homogeneous shared and distributed memory computers, and the cost model allo ws the prediction of the performance of a program. OTMP does not only in tegrates and extends sequential programming, but also includes and expands the message passing programming model. The model allows and exploits any nested levels of parallelism, taking advan tage of situations where there are several small nested loops.Eje: Procesamiento distribuido y paralelo (PDP)Red de Universidades con Carreras en Inform谩tica (RedUNCI

    Solving parallel problems by OTMP model

    Get PDF
    Since the early stages of parallel computing, one of the most common solutions to in troduce parallelism has been to extend a sequential language with some sort of parallel version of the for construct, commonly denoted as forall construct. Although similar syntax, these forall loops di er in their semantics and implementations. The High Performance Fortran (HPF) and OpenMP versions are, likely, among the most popular. This paper presents yet another forall loop extension for the C language. In this work, we introduce a parallel computation model: One Thread Multiple Processor Model (OTMP). This model proposes an abstract machine, a programming model and cost model. The programming model de nes another forall loop construct, the theorical machine aims for both homogeneous shared and distributed memory computers, and the cost model allo ws the prediction of the performance of a program. OTMP does not only in tegrates and extends sequential programming, but also includes and expands the message passing programming model. The model allows and exploits any nested levels of parallelism, taking advan tage of situations where there are several small nested loops.Eje: Procesamiento distribuido y paralelo (PDP)Red de Universidades con Carreras en Inform谩tica (RedUNCI

    Direct Linear Solvers for Vector and Parallel Computers

    Get PDF
    We consider direct methods for the numerical solution of linear systems with unsymmetric sparse matrices. Different strategies for the determination of the pivots are studied. For solving several linear systems with the same pattern structure we generate a pseudo code, that can be interpreted repeatedly to compute the solutions of these systems. The pseudo code can be advantageously adapted to vector and parallel computers. For that we have to find out the instructions of the pseudo code which are independent of each other. Based on this information, one can determine vector instructions for the pseudo code operations (vectorization) or spread the operations among different processors (parallelization). The methods are successfully used on vector and parallel computers for the circuit simulation of VLSI circuits as well as for the dynamic process simulation of complex chemical production plants

    Generic communication in parallel computation

    Get PDF
    The design of parallel programs requires fancy solutions that are not present in sequential programming. Thus, a designer of parallel applications is concerned with the problem of ensuring the correct behavior of all the processes that the program comprises. There are different solutions to each problem, but the question is to find one, that is general. One possibility is allowing the use of asynchronous groups of processors. We present a general methodology to derive efficient parallel divide and conquer algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. This methodology is defined by OTMP model and its expressiveness is exemplified through three divide and conquer programs.Eje: IV - Workshop de procesamiento distribuido y paraleloRed de Universidades con Carreras en Inform谩tica (RedUNCI

    Generic communication in parallel computation

    Get PDF
    The design of parallel programs requires fancy solutions that are not present in sequential programming. Thus, a designer of parallel applications is concerned with the problem of ensuring the correct behavior of all the processes that the program comprises. There are different solutions to each problem, but the question is to find one, that is general. One possibility is allowing the use of asynchronous groups of processors. We present a general methodology to derive efficient parallel divide and conquer algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. This methodology is defined by OTMP model and its expressiveness is exemplified through three divide and conquer programs.Eje: IV - Workshop de procesamiento distribuido y paraleloRed de Universidades con Carreras en Inform谩tica (RedUNCI

    Co-arrays in the Next Fortran Standard

    Get PDF

    SuperLU users' guide

    Full text link

    A class of parallel multiple-front algorithms on subdomains

    Full text link
    A class of parallel multiple-front solution algorithms is developed for solving linear systems arising from discretization of boundary value problems and evolution problems. The basic substructuring approach and frontal algorithm on each subdomain are first modified to ensure stable factorization in situations where ill-conditioning may occur due to differing material properties or the use of high degree finite elements ( p methods). Next, the method is implemented on distributed-memory multiprocessor systems with the final reduced (small) Schur complement problem solved on a single processor. A novel algorithm that implements a recursive partitioning approach on the subdomain interfaces is then developed. Both algorithms are implemented and compared in a least-squares finite-element scheme for viscous incompressible flow computation using h - and p -finite element schemes. Copyright 漏 2003 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/34536/1/627_ftp.pd

    El modelo de computaci贸n colectiva: una metodolog铆a eficiente para la ampliaci贸n del modelo de libreria de paso de mensajes con paralelismo de datos anidados

    Get PDF
    Se propone el Modelo de Computaci贸n Colectiva para la traslaci贸n eficiente de algoritmos con paralelismo de datos anidados sobre arquitecturas paralelas reales. El modelo viene caracterizado por una tripleta (M. Div. Col) donde M representa la plataforma paralela, Div es el conjunto de funciones de divisi贸n y Col el conjunto de funciones colectivas. Una funci贸n se dice colectiva cuando es realizada por todos los procesadores del conjunto actual. Los conjuntos de procesadores pueden ser divididos utilizando las funciones de Div. Se hace una propuesta para una implementaci贸n eficiente de los procesos de divisi贸n con la idea subyacente a de que cada uno de los procesadores de uno de los conjuntos producto de la escisi贸n mantiene una relaci贸n con uno (o m谩s) de los procesadores en los otros subconjuntos. Esta relaci贸n determina las comunicaciones de los resultados producto de la tarea realizada por el conjunto al que el procesador pertenece. Esta estructura de divisi贸n da lugar a patrones de comunicaciones que se asemejan a los de un hipercubo. La dimensi贸n viene determinada por el n煤mero de divisiones demandadas mientras que la aricidad en cada dimensi贸n es igual al n煤mero de subconjuntos solicitados. A semejanza de lo que ocurre en un hipercubo k-ario convencional, una dimensi贸n divide al conjunto en k subconjuntos comunicados a trav茅s de la dimensi贸n. Sin embargo, los subconjuntos opuestos seg煤n una dimensi贸n no tienen porqu茅 tener el mismo cardinal. A estas estructuras resultantes se las ha denominado Hipercubos Din谩micos. Se presenta una clasificaci贸n de problemas paralelos en funci贸n de las caracter铆sticas de los datos de entrada y de salida de los mismos con respecto a la visi贸n que de ellos tienen los procesadores de la m谩quina. La nomenclatura introducida se utiliza pra caracterizar los problemas que se pesentan en la memoria. Se aportan ejemplos de algoritmos tanto del tipo de los que se han denominado de Computaci贸n Colectiva como de Computaci贸n Colectiva Com煤n. Este 煤ltimo tipo de algoritmos resuelven un tipo concreto de problemas seg煤n la clasificaci贸n introducida. Para ambos tipos de algoritmos se estudian diferentes formas de introducir equilibrado de la carga de trabajo y los resultados que produce cada una de ellas. Se presenta tambi茅n una herramienta, La Laguna C, que representa una implementaci贸n concreta de las ideas subyacentes al Modelo de Computaci贸n Colectiva y se exponen los resultados computacionales obtenidos para varios algoritmos en diferentes arquitectura
    corecore