658 research outputs found
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
Loo.py: From Fortran to performance via transformation and substitution rules
A large amount of numerically-oriented code is written and is being written
in legacy languages. Much of this code could, in principle, make good use of
data-parallel throughput-oriented computer architectures. Loo.py, a
transformation-based programming system targeted at GPUs and general
data-parallel architectures, provides a mechanism for user-controlled
transformation of array programs. This transformation capability is designed to
not just apply to programs written specifically for Loo.py, but also those
imported from other languages such as Fortran. It eases the trade-off between
achieving high performance, portability, and programmability by allowing the
user to apply a large and growing family of transformations to an input
program. These transformations are expressed in and used from Python and may be
applied from a variety of settings, including a pragma-like manner from other
languages.Comment: ARRAY 2015 - 2nd ACM SIGPLAN International Workshop on Libraries,
Languages and Compilers for Array Programming (ARRAY 2015
Introducing Molly: Distributed Memory Parallelization with LLVM
Programming for distributed memory machines has always been a tedious task,
but necessary because compilers have not been sufficiently able to optimize for
such machines themselves. Molly is an extension to the LLVM compiler toolchain
that is able to distribute and reorganize workload and data if the program is
organized in statically determined loop control-flows. These are represented as
polyhedral integer-point sets that allow program transformations applied on
them. Memory distribution and layout can be declared by the programmer as
needed and the necessary asynchronous MPI communication is generated
automatically. The primary motivation is to run Lattice QCD simulations on IBM
Blue Gene/Q supercomputers, but since the implementation is not yet completed,
this paper shows the capabilities on Conway's Game of Life
Loo.py: transformation-based code generation for GPUs and CPUs
Today's highly heterogeneous computing landscape places a burden on
programmers wanting to achieve high performance on a reasonably broad
cross-section of machines. To do so, computations need to be expressed in many
different but mathematically equivalent ways, with, in the worst case, one
variant per target machine.
Loo.py, a programming system embedded in Python, meets this challenge by
defining a data model for array-style computations and a library of
transformations that operate on this model. Offering transformations such as
loop tiling, vectorization, storage management, unrolling, instruction-level
parallelism, change of data layout, and many more, it provides a convenient way
to capture, parametrize, and re-unify the growth among code variants. Optional,
deep integration with numpy and PyOpenCL provides a convenient computing
environment where the transition from prototype to high-performance
implementation can occur in a gradual, machine-assisted form
Advances in Engineering Software for Multicore Systems
The vast amounts of data to be processed by today’s applications demand higher computational power. To meet application requirements and achieve reasonable application performance, it becomes increasingly profitable, or even necessary, to exploit any available hardware parallelism. For both new and legacy applications, successful parallelization is often subject to high cost and price. This chapter proposes a set of methods that employ an optimistic semi-automatic approach, which enables programmers to exploit parallelism on modern hardware architectures. It provides a set of methods, including an LLVM-based tool, to help programmers identify the most promising parallelization targets and understand the key types of parallelism. The approach reduces the manual effort needed for parallelization. A contribution of this work is an efficient profiling method to determine the control and data dependences for performing parallelism discovery or other types of code analysis. Another contribution is a method for detecting code sections where parallel design patterns might be applicable and suggesting relevant code transformations. Our approach efficiently reports detailed runtime data dependences. It accurately identifies opportunities for parallelism and the appropriate type of parallelism to use as task-based or loop-based
05101 Abstracts Collection -- Scheduling for Parallel Architectures: Theory, Applications, Challenges
From 06.03.05 to 11.03.05, the Dagstuhl Seminar 05101 ``Scheduling for Parallel Architectures: Theory, Applications, Challenges\u27\u27 was held
in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general
- …