2 research outputs found
Exploiting Monotone Convergence Functions in Parallel Programs
Scientific codes which use iterative methods are often difficult to
parallelize well. Such codes usually contain \code{while} loops which
iterate until they converge upon the solution. Problems arise since
the number of iterations cannot be determined at compile time, and
tests for termination usually require a global reduction and an
associated barrier. We present a method which allows us avoid
performing global barriers and exploit pipelined parallelism when
processors can detect non-convergence from local information.
(Also cross-referenced as UMIACS-TR-96-31.1
A polyhedral compilation framework for loops with dynamic data-dependent bounds
International audienceWe study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable, removing the loop-carried data dependences induced by termination conditions. We propose an automatic compilation approach to parallelize and optimize dynamic counted loops. Our approach relies on affine relations only, as implemented in state-of-the-art polyhedral libraries. Revisiting a state-of-the-art framework to parallelize arbitrary while loops, we introduce additional control dependences on data-dependent predicates. Our method goes beyond the state of the art in fully automating the process, specializing the code generation algorithm to the case of dynamic counted loops and avoiding the introduction of spurious loop-carried dependences. We conduct experiments on representative irregular computations, from dynamic programming, computer vision and finite element methods to sparse matrix linear algebra. We validate that the method is applicable to general affine transformations for locality optimization, vectorization and parallelization