7 research outputs found

    XARK: an extensible framework for automatic recognition of computational kernels

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Programming Languages and Systems. The final authenticated version is available online at: http://dx.doi.org/10.1145/1391956.1391959[Abstract] The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.Ministeiro de Educaci贸n y Ciencia; TIN2004-07797-C02Ministeiro de Educaci贸n y Ciencia; TIN2007-67537-C03Xunta de Galicia; PGIDIT05PXIC10504PNXunta de Galicia; PGIDIT06PXIB105228P

    Abstraction Raising in General-Purpose Compilers

    Get PDF

    Compilation techniques for automatic extraction of parallelism and locality in heterogeneous architectures

    Get PDF
    [Abstract] High performance computing has become a key enabler for innovation in science and industry. This fact has unleashed a continuous demand of more computing power that the silicon industry has satisfied with parallel and heterogeneous architectures, and complex memory hierarchies. As a consequence, software developers have been challenged to write new codes and rewrite the old ones to be efficient in these new systems. Unfortunately, success cases are scarce and require huge investments in human workforce. Current compilers generate peak-peformance binary code in monocore architectures. Following this victory, this thesis explores new ideas in compiler design to overcome this challenge with the automatic extraction of parallelism and locality. First, we present a new compiler intermediate representation based on diKernels named KIR, which is insensitive to syntactic variations in the source code and exposes multiple levels of parallelism. On top of the KIR, we build a source-to-source approach that generates parallel code annotated with compiler directives: OpenMP for multicores and OpenHMPP for GPUs. Finally, we model program behavior from the point of view of the memory accesses through the reconstruction of affine loops for sequential and parallel codes. The experimental evaluations throughout the thesis corroborate the effectiveness and efficiency of the proposed solutions.[Resumen]La computaci贸n de altas prestaciones se ha convertido en un habilitador clave para la innovaci贸n en la ciencia y la industria. Este hecho ha propiciado una demanda continua de m谩s poder computacional que la industria del silicio ha satisfecho con arquitecturas paralelas y heterog茅neas, y jerarqu铆as de memoria complejas. Como consecuencia, los desarrolladores de software han sido desafiados a escribir c贸digos nuevos y reescribir los antiguos para que sean eficientes en estos nuevos sistemas. Desafortunadamente, los casos de 茅xito son escasos y requieren inversiones enormes en fuerza de trabajo. Los compiladores actuales generan c贸digo binario con rendimiento m谩ximo en las arquitecturas monon煤cleo. Siguiendo esta victoria, esta tesis explora nuevas ideas en el dise帽o de compiladores para superar este reto con la extracci贸n autom谩tica de paralelismo y localidad. En primer lugar, presentamos una nueva representaci贸n intermedia de compilador basada en diKernels denominada KIR, la cual es insensible a variaciones sint谩cticas en el c贸digo de fuente y expone m煤ltiples niveles de paralelismo. Sobre la KIR, construimos una aproximaci贸n fuente-a-fuente que genera c贸digo paralelo anotado con directivas: OpenMP para multin煤cleos y OpenHMPP para GPUs. Finalmente, modelamos el comportamiento del programa desde el punto de vista de los accesos de memoria a trav茅s de la reconstrucci贸n de bucles afines para c贸digos secuenciales y paralelos. Las evaluaciones experimentales a lo largo de la tesis corroboran la efectividad y eficacia de las soluciones propuestas.[Resumo]A computaci贸n de altas prestaci贸ns converteuse nun habilitador clave para a innovaci贸n na ciencia e na industria. Este feito propiciou unha demanda continua de m谩is poder computacional que a industria do silicio satisfixo con arquitecturas paralelas e heterox茅neas, e xerarqu铆as de memoria complexas. Como consecuencia, os desenvolvedores de software foron desafiados a escribir c贸digos novos e reescribir os antigos para que sexan eficientes nestes novos sistemas. Desafortunadamente, os casos de 茅xito son escasos e requiren investimentos enormes en forza de traballo. Os compiladores actuais xeran c贸digo binario con rendemento m谩ximo nas arquitecturas monon煤cleo. Seguindo esta vitoria, esta tese explora novas ideas no dese帽o de compiladores para superar este reto coa extracci贸n autom谩tica de paralelismo e localidade. En primeiro lugar, presentamos unha nova representaci贸n intermedia de compilador baseada en diKernels denominada KIR, a cal 茅 insensible a variaci贸ns sint谩cticas no c贸digo fonte e exp贸n m煤ltiples niveis de paralelismo. Sobre a KIR, constru铆mos unha aproximaci贸n fonte-a-fonte que xera c贸digo paralelo anotado con directivas: OpenMP para multin煤cleos e OpenHMPP para GPUs. Finalmente, modelamos o comportamento do programa desde o punto de vista dos accesos de memoria a trav茅s da reconstruci贸n de bucles af铆ns para c贸digos secuenciais e paralelos. As avaliaci贸ns experimentais ao longo da tese corroboran a efectividade e eficacia das soluci贸ns propostas
    corecore