Search CORE

7 research outputs found

XARK: an extensible framework for automatic recognition of computational kernels

Author: Arenaz Silva Manuel
Doallo Ramón
Touriño Juan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Programming Languages and Systems. The final authenticated version is available online at: http://dx.doi.org/10.1145/1391956.1391959[Abstract] The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.Ministeiro de Educación y Ciencia; TIN2004-07797-C02Ministeiro de Educación y Ciencia; TIN2007-67537-C03Xunta de Galicia; PGIDIT05PXIC10504PNXunta de Galicia; PGIDIT06PXIB105228P

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Abstraction Raising in General-Purpose Compilers

Author: Chelini Lorenzo
Publication venue: Eindhoven University of Technology
Publication date: 31/08/2021
Field of study

Pure OAI Repository

Compilation techniques for automatic extraction of parallelism and locality in heterogeneous architectures

Author: Andión José M.
Publication venue
Publication date: 01/01/2015
Field of study

[Abstract] High performance computing has become a key enabler for innovation in science and industry. This fact has unleashed a continuous demand of more computing power that the silicon industry has satisfied with parallel and heterogeneous architectures, and complex memory hierarchies. As a consequence, software developers have been challenged to write new codes and rewrite the old ones to be efficient in these new systems. Unfortunately, success cases are scarce and require huge investments in human workforce. Current compilers generate peak-peformance binary code in monocore architectures. Following this victory, this thesis explores new ideas in compiler design to overcome this challenge with the automatic extraction of parallelism and locality. First, we present a new compiler intermediate representation based on diKernels named KIR, which is insensitive to syntactic variations in the source code and exposes multiple levels of parallelism. On top of the KIR, we build a source-to-source approach that generates parallel code annotated with compiler directives: OpenMP for multicores and OpenHMPP for GPUs. Finally, we model program behavior from the point of view of the memory accesses through the reconstruction of affine loops for sequential and parallel codes. The experimental evaluations throughout the thesis corroborate the effectiveness and efficiency of the proposed solutions.[Resumen]La computación de altas prestaciones se ha convertido en un habilitador clave para la innovación en la ciencia y la industria. Este hecho ha propiciado una demanda continua de más poder computacional que la industria del silicio ha satisfecho con arquitecturas paralelas y heterogéneas, y jerarquías de memoria complejas. Como consecuencia, los desarrolladores de software han sido desafiados a escribir códigos nuevos y reescribir los antiguos para que sean eficientes en estos nuevos sistemas. Desafortunadamente, los casos de éxito son escasos y requieren inversiones enormes en fuerza de trabajo. Los compiladores actuales generan código binario con rendimiento máximo en las arquitecturas mononúcleo. Siguiendo esta victoria, esta tesis explora nuevas ideas en el diseño de compiladores para superar este reto con la extracción automática de paralelismo y localidad. En primer lugar, presentamos una nueva representación intermedia de compilador basada en diKernels denominada KIR, la cual es insensible a variaciones sintácticas en el código de fuente y expone múltiples niveles de paralelismo. Sobre la KIR, construimos una aproximación fuente-a-fuente que genera código paralelo anotado con directivas: OpenMP para multinúcleos y OpenHMPP para GPUs. Finalmente, modelamos el comportamiento del programa desde el punto de vista de los accesos de memoria a través de la reconstrucción de bucles afines para códigos secuenciales y paralelos. Las evaluaciones experimentales a lo largo de la tesis corroboran la efectividad y eficacia de las soluciones propuestas.[Resumo]A computación de altas prestacións converteuse nun habilitador clave para a innovación na ciencia e na industria. Este feito propiciou unha demanda continua de máis poder computacional que a industria do silicio satisfixo con arquitecturas paralelas e heteroxéneas, e xerarquías de memoria complexas. Como consecuencia, os desenvolvedores de software foron desafiados a escribir códigos novos e reescribir os antigos para que sexan eficientes nestes novos sistemas. Desafortunadamente, os casos de éxito son escasos e requiren investimentos enormes en forza de traballo. Os compiladores actuais xeran código binario con rendemento máximo nas arquitecturas mononúcleo. Seguindo esta vitoria, esta tese explora novas ideas no deseño de compiladores para superar este reto coa extracción automática de paralelismo e localidade. En primeiro lugar, presentamos unha nova representación intermedia de compilador baseada en diKernels denominada KIR, a cal é insensible a variacións sintácticas no código fonte e expón múltiples niveis de paralelismo. Sobre a KIR, construímos unha aproximación fonte-a-fonte que xera código paralelo anotado con directivas: OpenMP para multinúcleos e OpenHMPP para GPUs. Finalmente, modelamos o comportamento do programa desde o punto de vista dos accesos de memoria a través da reconstrución de bucles afíns para códigos secuenciais e paralelos. As avaliacións experimentais ao longo da tese corroboran a efectividade e eficacia das solucións propostas

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas