3 research outputs found

    Analisis and tools for performance prediction

    Get PDF
    We present an analytical model that extends BSP to cover both oblivious synchronization and group partitioning. There are a few oversimplifications in BSP that make difficult to have accurate predictions. Even if the numbers of individual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. Even worse, the assumption that a constant number of machine instructions takes constant time is far from the truth. Current memory hierarchies imply that memory access vary from a few cycles to several thousands. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each “communication block”. Unfortunately, to use this approach implies that the evaluation parameters not only depend on given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.Eje: Programación concurrenteRed de Universidades con Carreras en Informática (RedUNCI

    Analisis and tools for performance prediction

    Get PDF
    We present an analytical model that extends BSP to cover both oblivious synchronization and group partitioning. There are a few oversimplifications in BSP that make difficult to have accurate predictions. Even if the numbers of individual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. Even worse, the assumption that a constant number of machine instructions takes constant time is far from the truth. Current memory hierarchies imply that memory access vary from a few cycles to several thousands. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each “communication block”. Unfortunately, to use this approach implies that the evaluation parameters not only depend on given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.Eje: Programación concurrenteRed de Universidades con Carreras en Informática (RedUNCI

    Factores de rendimiento asociados a SPMD

    Get PDF
    Actualmente existen muchas aplicaciones paralelas/distribuidas en las cuales SPMD es el paradigma más usado. Obtener un buen rendimiento en una aplicación paralela de este tipo es uno de los principales desafíos dada la gran cantidad de aplicaciones existentes. Este objetivo no es fácil de resolver ya que existe una gran variedad de configuraciones de hardware, y también la naturaleza de los problemas pueden ser variados así como la forma de implementarlos. En consecuencia, si no se considera adecuadamente la combinación "software/hardware" pueden aparecer problemas inherentes a una aplicación iterativa sin una jerarquía de control definida de acuerdo a este paradigma. En SPMD todos los procesos ejecutan el mismo código pero computan una sección diferente de los datos de entrada. Una solución a un posible problema del rendimiento es proponer una estrategia de balance de carga para homogeneizar el cómputo entre los diferentes procesos. En este trabajo analizamos el benchmark CG con cargas heterogéneas con la finalidad de detectar los posibles problemas de rendimiento en una aplicación real. Un factor que determina el rendimiento en esta aplicación es la cantidad de elementos nonzero contenida en la sección de matriz asignada a cada proceso. Determinamos que es posible definir una estrategia de balance de carga que puede ser implementada de forma dinámica y demostramos experimentalmente que el rendimiento de la aplicación puede mejorarse de forma significativa con dicha estrategia.There currently are many 'parallel/distributed' applications that use the SPMD paradigm. Getting a good performance in a parallel application of this type is a major challenge because of the large number of existing applications. This objective is not easily achieved because there are many hardware configurations possible, and also the nature of the problems can be varied as well as its implementation. Consequently, if not adequately consider the combination 'software/hardware' inherent problems can occur without an iterative application defined control hierarchy according to this paradigm. In SPMD all processes execute the same code but they compute a different section of the input data. In this paper we analyze the benchmark CG with heterogeneous loads in order to detect possible performance problems in a real application. One factor that determines the performance in this application is the number of elements nonzero contained in the array section assigned to each process. We determined that it is possible to define a load balancing strategy, which can be implemented dynamically, and we demonstrate experimentally that the application performance can be significantly improved with this approach.Actualment existeixen moltes aplicacions paral·leles/distribuïdes en les quals SPMD és el paradigma més emprat. Obtenir un bon rendiment en una aplicació paral·lela d'aquest tipus és un dels principals reptes donada la gran quantitat d'aplicacions existents. Aquest objectiu no és fàcil de resoldre donat que existeixen una gran varietat de configuracions de hardware, i també la naturalesa dels problemes pot ser variada així com la forma d'implementar-los. En conseqüència, si no es considera adequadament la combinació "software/hardware" poden aparèixer problemes inherents a una aplicació iterativa sense una jerarquia de control definida d'acord a aquest paradigma. En SPMD tots els processos executen el mateix codi però computen una secció diferent de les dades d'entrada. Una solució a un possible problema de rendiment es proposar una estratègia de balanceig de càrrega per homogeneïtzar el còmput entre els diferents processos. En aquest treball analitzem el benchmark CG amb càrregues heterogènies amb la finalitat de detectar els possibles problemes de rendiment en una aplicació real. Un factor que determina el rendiment en aquesta aplicació és la quantitat d'elements nonzero continguda en la secció de la matriu assignada a cada procés. Es determina que és possible definir una estratègia de balanceig de càrrega que pot ser implementada de forma dinàmica i es demostra de forma experimental que el rendiment de la aplicació pot millorar-se de forma significativa amb aquesta estratègia
    corecore