927 research outputs found
The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization
Research in automatic parallelization of loop-centric programs started with
static analysis, then broadened its arsenal to include dynamic
inspection-execution and speculative execution, the best results involving
hybrid static-dynamic schemes. Beyond the detection of parallelism in a
sequential program, scalable parallelization on many-core processors involves
hard and interesting parallelism adaptation and mapping challenges. These
challenges include tailoring data locality to the memory hierarchy, structuring
independent tasks hierarchically to exploit multiple levels of parallelism,
tuning the synchronization grain, balancing the execution load, decoupling the
execution into thread-level pipelines, and leveraging heterogeneous hardware
with specialized accelerators. The polyhedral framework allows to model,
construct and apply very complex loop nest transformations addressing most of
the parallelism adaptation and mapping challenges. But apart from
hardware-specific, back-end oriented transformations (if-conversion, trace
scheduling, value prediction), loop nest optimization has essentially ignored
dynamic and speculative techniques. Research in polyhedral compilation recently
reached a significant milestone towards the support of dynamic, data-dependent
control flow. This opens a large avenue for blending dynamic analyses and
speculative techniques with advanced loop nest optimizations. Selecting
real-world examples from SPEC benchmarks and numerical kernels, we make a case
for the design of synergistic static, dynamic and speculative loop
transformation techniques. We also sketch the embedding of dynamic information,
including speculative assumptions, in the heart of affine transformation search
spaces
AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests
The last improvements in programming languages, programming models, and
frameworks have focused on abstracting the users from many programming issues.
Among others, recent programming frameworks include simpler syntax, automatic
memory management and garbage collection, which simplifies code re-usage
through library packages, and easily configurable tools for deployment. For
instance, Python has risen to the top of the list of the programming languages
due to the simplicity of its syntax, while still achieving a good performance
even being an interpreted language. Moreover, the community has helped to
develop a large number of libraries and modules, tuning them to obtain great
performance.
However, there is still room for improvement when preventing users from
dealing directly with distributed and parallel computing issues. This paper
proposes and evaluates AutoParallel, a Python module to automatically find an
appropriate task-based parallelization of affine loop nests to execute them in
parallel in a distributed computing infrastructure. This parallelization can
also include the building of data blocks to increase task granularity in order
to achieve a good execution performance. Moreover, AutoParallel is based on
sequential programming and only contains a small annotation in the form of a
Python decorator so that anyone with little programming skills can scale up an
application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and
Scientific Computing (PyHPC 2018
BFCA+: Automatic Synthesis of Parallel Code with TLS Capabilities
Producción CientíficaParallelization of sequential applications requires extracting information about the loops and how their variables are accessed, and afterwards, augmenting the source code with extra code depending on such information. In this paper we propose a framework that avoids such an error-prone, time-consuming task. Our solution leverages the compile-time information extracted from the source code to classify all variables used inside each loop according to their accesses. Then, our system, called BFCA+, automatically instruments the source code with the necessary OpenMP directives and clauses to allow its parallel execution, using the standard shared and private clauses for variable classification. The framework is also capable of instrumenting loops for speculative parallelization, with the help of the ATLaS runtime system, that defines a new speculative clause to point out those variables that may lead to a dependency violation. As a result, the target loop is guaranteed to correctly run in parallel, ensuring that its execution follows sequential semantics even in the presence of dependency violations. Our experimental evaluation shows that the framework not only saves development time, but also leads to a faster code than the one manually parallelized.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAPH5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)
Compile-time support for thread-level speculation
Una de las principales preocupaciones de las ciencias de la computación es el estudio de las capacidades paralelas tanto de programas como de los procesadores que los ejecutan. Existen varias razones que hacen muy deseable el desarrollo de técnicas que paralelicen automáticamente el código. Entre ellas se encuentran el inmenso número de programas secuenciales existentes ya escritos, la complejidad de los lenguajes de programación paralelos, y los conocimientos que se requieren para paralelizar un código. Sin embargo, los actuales mecanismos de paralelización automática implementados en los compiladores comerciales no son capaces de paralelizar la mayoría de los bucles en un código [1], debido a la dependencias de datos que existen entre ellos [2]. Por lo tanto, se hace necesaria la búsqueda de nuevas técnicas, como la paralelización especulativa [3-5], que saquen beneficio de las potenciales capacidades paralelas del hardware y arquitecturas multiprocesador actuales. Sin embargo, ésta y otras técnicas requieren la intervención manual de programadores experimentados.
Antes de ofrecer soluciones alternativas, se han evaluado las capacidades de paralelización de los compiladores comerciales, exponiendo las limitaciones de los mecanismos de paralelización automática que implementan. El estudio revela que estos mecanismos de paralelización automática sólo alcanzan un 19% de speedup en promedio para los benchmarks del SPEC CPU2006 [6], siendo este un resultado significativamente inferior al obtenido por técnicas de paralelización especulativa [7]. Sin embargo, la paralelización especulativa requiere una extensa modificación manual del código por parte de programadores.
Esta Tesis aborda este problema definiendo una nueva cláusula OpenMP [8], llamada ¿speculative¿, que permite señalar qué variables pueden llevar a una violación de dependencia. Además, esta Tesis también propone un sistema en tiempo de compilación que, usando la información sobre los accesos a las variables que proporcionan las cláusulas OpenMP, añade automáticamente todo el código necesario para gestionar la ejecución especulativa de un programa. Esto libera al programador de modificar el código manualmente, evitando posibles errores y una tediosa tarea. El código generado por nuestro sistema enlaza con la librería de ejecución especulativamente paralela desarrollada por Estebanez, García-Yagüez, Llanos y Gonzalez-Escribano [9,10].Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos
Recommended from our members
Parallelization by Simulated Tunneling
As highly parallel heterogeneous computers become commonplace, automatic parallelization of software is an increasingly critical unsolved problem. Continued progress on this problem will require large quantities of information about the runtime structure of sequential programs to be stored and reasoned about. Manually formalizing all this information through traditional approaches, which rely on semantic analysis at the language or instruction level, has historically proved challenging. We take a lower level approach, eschewing semantic analysis and instead modeling von Neumann computation as a dynamical system, i.e., a state space and an evolution rule, which gives a natural way to use probabilistic inference to automatically learn powerful representations of this information. This model enables a promising new approach to automatic parallelization, in which probability distributions empirically learned over the state space are used to guide speculative solvers. We describe a prototype virtual machine that uses this model of computation to automatically achieve linear speedups for an important class of deterministic, sequential Intel binary programs through statistical machine learning and a speculative, generalized form of memoization.Engineering and Applied Science
- …