5 research outputs found

    Exploiting Locality and Parallelism with Hierarchically Tiled Arrays

    Get PDF
    The importance of tiles or blocks in mathematics and thus computer science cannot be overstated. From a high level point of view, they are the natural way to express many algorithms, both in iterative and recursive forms. Tiles or sub-tiles are used as basic units in the algorithm description. From a low level point of view, tiling, either as the unit maintained by the algorithm, or as a class of data layouts, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers given the growing gap between memory and processor speed. Finally, tiles and operations on them are also basic to express data distribution and parallelism. Despite the importance of this concept, which makes inevitable its widespread usage, most languages do not support it directly. Programmers have to understand and manage the low-level details along with the introduction of tiling. This gives place to bloated potentially error-prone programs in which opportunities for performance are lost. On the other hand, the disparity between the algorithm and the actual implementation enlarges. This thesis illustrates the power of Hierarchically Tiled Arrays (HTAs), a data type which enables the easy manipulation of tiles in object-oriented languages. The objective is to evolve this data type in order to make the representation of all classes for algorithms with a high degree of parallelism and/or locality as natural as possible. We show in the thesis a set of tile operations which leads to a natural and easy implementation of different algorithms in parallel and in sequential with higher clarity and smaller size. In particular, two new language constructs dynamic partitioning and overlapped tiling are discussed in detail. They are extensions of the HTA data type to improve its capabilities to express algorithms with a high abstraction and free programmers from programming tedious low-level tasks. To prove the claims, two popular languages, C++ and MATLAB are extended with our HTA data type. In addition, several important dense linear algebra kernels, stencil computation kernels, as well as some benchmarks in NAS benchmark suite were implemented. We show that the HTA codes needs less programming effort with a negligible effect on performance

    Library-based solutions for algorithms with complex patterns of parallelism

    Get PDF
    [Resumen] Con la llegada de los procesadores multin煤ucleo y la ca铆da del crecimiento de la capacidad de procesamiento por n煤cleo en cada nueva generaci贸n, la paralelizaci贸n es cada vez m谩s cr铆tica para mejorar el rendimiento de todo tipo de aplicaciones. Por otra parte, si bien hay un buen conocimiento y soporte de los patrones de paralelismo m谩s sencillos, esto no es as铆 para los patrones complejos e irregulares, cuya paralelizaci贸n requiere o bien herramientas de bajo nivel que afectan negativamente a la productividad, o bien soluciones transaccionales con requisitos espec铆ficos de hardware o que implican grandes sobrecostes. El aumento del n煤mero de aplicaciones que exhiben estos patrones complejos hace que este sea un problema con importancia creciente. Esta tesis trata de mejorar la comprensi贸n y el soporte de tres tipos de patrones complejos, mediante la identificaci贸n de abstracciones y sem谩nticas claras que ayuden su paralelizaci贸n en entornos de memoria compartida. El enfoque elegido fue la creaci贸n de librer铆as, ya que facilitan la reutilizaci贸n de c贸digo, reducen los requisitos del compilador, y tienen una curva de aprendizaje relativamente corta. El lenguaje empleado para la implementaci贸n es C++, pues proporciona un buen rendimiento y capacidad para expresar las abstracciones necesarias. Los ejemplos y evaluaciones en esta tesis muestran que nuestras propuestas permiten expresar de manera elegante las aplicaciones que presentan estos patrones, mejorando su programabilidad al tiempo que proporcionan un rendimiento similar o superior al de otras soluciones existentes.[Abstract] With the arrival of multi-core processors and the reduction in the growth rate of the processing power per core in each new generation, parallelization is becoming increasingly critical to improve the performance of every kind of application. Also, while simple patterns of parallelism are well understood and supported, this is not the case for complex and irregular patterns, whose parallelization requires either low level tools that hurt programmers' productivity or transactional based approaches that need specific hardware or imply potentially large overheads. This is becoming an increasingly important problem as the number of applications that exhibit these latter patterns is steadily growing. This thesis tries to better understand and support three kinds of complex patterns through the identification of abstractions and clear semantics that help bring structure to them and the development of libraries based on our observations that facilitate their parallelization in shared memory environments. The library approach was chosen given its advantages for code reuse, reduced compiler requirements, and relatively short learning curve. The implementation language selected being C++ due to its good performance and capability to express abstractions. The examples and evaluations in this thesis show that our proposals allow to elegantly express the applications that present these patterns, improving their programmability while providing similar or even better performance than existing approaches.[Resumo] Coa chegada dos procesadores multin煤cleo e a ca铆da do crecemento da capacidade de procesamento por n煤cleo en cada nova xeraci贸n, a paralelizaci贸n 茅 cada vez m谩is cr铆tica para mellorar o rendemento de todo tipo de aplicaci贸ns. Ademais, hai un bo co~necemento e soporte dos patr贸ns de paralelismo m谩is sinxelos, mais non sendo as铆 para patr贸ns complexos e irregulares, cuxa paralelizaci贸n require ben ferramentas de baixo nivel que afectan negativamente 谩 produtividade, ben soluci贸ns transaccionais con requisitos espec铆ficos de hardware ou que implican grandes sobrecostes. O aumento do n煤mero de aplicaci贸ns que exhiben estes patr贸ns complexos fai que este sexa un problema con importancia crecente. Esta tese trata de mellorar a comprensi贸n e o soporte de tres tipos de patr贸ns complexos mediante a identificaci贸n de abstracci贸ns e sem谩nticas claras que axuden a s煤a paralelizaci贸n en entornos de memoria compartida. O enfoque elixido foi a creaci贸n de librar铆as, xa que facilitan a reutilizaci贸n de c贸digo, reducen os requisitos do compilador, e te帽en unha curva de aprendizaxe relativamente curta. A linguaxe empregada para a implementaci贸n 茅 C++, pois proporciona un bo rendemento e capacidade para expresar as abstracci贸ns necesarias. Os exemplos e avaliaci贸ns nesta tese mostran que as nosas propostas permiten expresar de xeito elegante as aplicaci贸ns que presentan estes patr贸ns, mellorando a s煤a programabilidade ao tempo que proporcionan un rendemento similar ou superior ao de outras soluci贸ns existentes

    The STAPL Parallel Container Framework

    Get PDF
    The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C with support for parallelism. STAPL provides a run-time system, a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms), and a generic methodology for extending them to provide customized functionality. Parallel containers are data structures addressing issues related to data partitioning, distribution, communication, synchronization, load balancing, and thread safety. This dissertation presents the STAPL Parallel Container Framework (PCF), which is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers without requiring the programmer to deal with concurrency or data distribution issues. The STAPL PCF provides a large number of basic data parallel structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The STAPL PCF is distinguished from existing work by offering a class hierarchy and a composition mechanism which allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate the performance of the STAPL pContainers on various parallel machines including a massively parallel CRAY XT4 system and an IBM P5-575 cluster. We show that the pContainer methods, generic pAlgorithms, and different applications, all provide good scalability on more than 10^4 processors
    corecore