7 research outputs found
XARK: an extensible framework for automatic recognition of computational kernels
This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Programming Languages and Systems. The final authenticated version is available online at: http://dx.doi.org/10.1145/1391956.1391959[Abstract] The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.Ministeiro de Educaci贸n y Ciencia; TIN2004-07797-C02Ministeiro de Educaci贸n y Ciencia; TIN2007-67537-C03Xunta de Galicia; PGIDIT05PXIC10504PNXunta de Galicia; PGIDIT06PXIB105228P
Compilation techniques for automatic extraction of parallelism and locality in heterogeneous architectures
[Abstract]
High performance computing has become a key enabler for innovation in science
and industry. This fact has unleashed a continuous demand of more computing
power that the silicon industry has satisfied with parallel and heterogeneous
architectures, and complex memory hierarchies. As a consequence, software
developers have been challenged to write new codes and rewrite the old
ones to be efficient in these new systems. Unfortunately, success cases are scarce
and require huge investments in human workforce. Current compilers generate
peak-peformance binary code in monocore architectures. Following this victory,
this thesis explores new ideas in compiler design to overcome this challenge with
the automatic extraction of parallelism and locality. First, we present a new compiler
intermediate representation based on diKernels named KIR, which is insensitive
to syntactic variations in the source code and exposes multiple levels of
parallelism. On top of the KIR, we build a source-to-source approach that generates
parallel code annotated with compiler directives: OpenMP for multicores
and OpenHMPP for GPUs. Finally, we model program behavior from the point
of view of the memory accesses through the reconstruction of affine loops for sequential
and parallel codes. The experimental evaluations throughout the thesis
corroborate the effectiveness and efficiency of the proposed solutions.[Resumen]La computaci贸n de altas prestaciones se ha convertido en un habilitador clave
para la innovaci贸n en la ciencia y la industria. Este hecho ha propiciado una
demanda continua de m谩s poder computacional que la industria del silicio ha
satisfecho con arquitecturas paralelas y heterog茅neas, y jerarqu铆as de memoria
complejas. Como consecuencia, los desarrolladores de software han sido desafiados
a escribir c贸digos nuevos y reescribir los antiguos para que sean eficientes
en estos nuevos sistemas. Desafortunadamente, los casos de 茅xito son escasos y
requieren inversiones enormes en fuerza de trabajo. Los compiladores actuales
generan c贸digo binario con rendimiento m谩ximo en las arquitecturas monon煤cleo.
Siguiendo esta victoria, esta tesis explora nuevas ideas en el dise帽o de compiladores
para superar este reto con la extracci贸n autom谩tica de paralelismo y
localidad. En primer lugar, presentamos una nueva representaci贸n intermedia de
compilador basada en diKernels denominada KIR, la cual es insensible a variaciones
sint谩cticas en el c贸digo de fuente y expone m煤ltiples niveles de paralelismo.
Sobre la KIR, construimos una aproximaci贸n fuente-a-fuente que genera c贸digo
paralelo anotado con directivas: OpenMP para multin煤cleos y OpenHMPP para
GPUs. Finalmente, modelamos el comportamiento del programa desde el punto
de vista de los accesos de memoria a trav茅s de la reconstrucci贸n de bucles afines
para c贸digos secuenciales y paralelos. Las evaluaciones experimentales a lo largo
de la tesis corroboran la efectividad y eficacia de las soluciones propuestas.[Resumo]A computaci贸n de altas prestaci贸ns converteuse nun habilitador clave para a innovaci贸n
na ciencia e na industria. Este feito propiciou unha demanda continua
de m谩is poder computacional que a industria do silicio satisfixo con arquitecturas
paralelas e heterox茅neas, e xerarqu铆as de memoria complexas. Como consecuencia,
os desenvolvedores de software foron desafiados a escribir c贸digos novos e
reescribir os antigos para que sexan eficientes nestes novos sistemas. Desafortunadamente,
os casos de 茅xito son escasos e requiren investimentos enormes en
forza de traballo. Os compiladores actuais xeran c贸digo binario con rendemento
m谩ximo nas arquitecturas monon煤cleo. Seguindo esta vitoria, esta tese explora
novas ideas no dese帽o de compiladores para superar este reto coa extracci贸n autom谩tica
de paralelismo e localidade. En primeiro lugar, presentamos unha nova
representaci贸n intermedia de compilador baseada en diKernels denominada KIR,
a cal 茅 insensible a variaci贸ns sint谩cticas no c贸digo fonte e exp贸n m煤ltiples niveis
de paralelismo. Sobre a KIR, constru铆mos unha aproximaci贸n fonte-a-fonte
que xera c贸digo paralelo anotado con directivas: OpenMP para multin煤cleos e
OpenHMPP para GPUs. Finalmente, modelamos o comportamento do programa
desde o punto de vista dos accesos de memoria a trav茅s da reconstruci贸n de bucles
af铆ns para c贸digos secuenciais e paralelos. As avaliaci贸ns experimentais ao
longo da tese corroboran a efectividade e eficacia das soluci贸ns propostas