Search CORE

702 research outputs found

Optimal loop parallelization

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1988
Field of study

Crossref

Loop Parallelization using Dynamic Commutativity Analysis

Author: Castaneda Lozano Roberto
Cole Murray
Franke Bjoern
Vasiladiotis Chris
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/03/2021
Field of study

Edinburgh Research Explorer

Can We Run in Parallel? Automating Loop Parallelization for TornadoVM

Author: Kulshreshtha Shreyansh
Sharma Rishi
Thakur Manas
Publication venue
Publication date: 07/05/2022
Field of study

With the advent of multi-core systems, GPUs and FPGAs, loop parallelization has become a promising way to speed-up program execution. In order to stay up with time, various performance-oriented programming languages provide a multitude of constructs to allow programmers to write parallelizable loops. Correspondingly, researchers have developed techniques to automatically parallelize loops that do not carry dependences across iterations, and/or call pure functions. However, in managed languages with platform-independent runtimes such as Java, it is practically infeasible to perform complex dependence analysis during JIT compilation. In this paper, we propose AutoTornado, a first of its kind static+JIT loop parallelizer for Java programs that parallelizes loops for heterogeneous architectures using TornadoVM (a Graal-based VM that supports insertion of @Parallel constructs for loop parallelization). AutoTornado performs sophisticated dependence and purity analysis of Java programs statically, in the Soot framework, to generate constraints encoding conditions under which a given loop can be parallelized. The generated constraints are then fed to the Z3 theorem prover (which we have integrated with Soot) to annotate canonical for loops that can be parallelized using the @Parallel construct. We have also added runtime support in TornadoVM to use static analysis results for loop parallelization. Our evaluation over several standard parallelization kernels shows that AutoTornado correctly parallelizes 61.3% of manually parallelizable loops, with an efficient static analysis and a near-zero runtime overhead. To the best of our knowledge, AutoTornado is not only the first tool that performs program-analysis based parallelization for a real-world JVM, but also the first to integrate Z3 with Soot for loop parallelization

arXiv.org e-Print Archive

Logical Inference Techniques for Loop Parallelization

Author: Cosmin E. Oancea
Publication venue
Publication date
Field of study

This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop’s memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = ∅, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = ∅). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-CLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers

CiteSeerX

Automatic Loop Parallelization via Compiler Guided Refactoring

Author: Karlsson Sven
Ladelsky Razya
Larsen Per
Lidman Jacob
McKee Sally A.
Zaks Ayal
Publication venue: Technical University of Denmark
Publication date: 01/01/2011
Field of study

Online Research Database In Technology

Recommended from our members

Parallelizing non-vectorizable loops for MIMD machines

Author: Kim Ki-chang
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1990
Field of study

Parallelizing a loop for MIMD machines can be described as a process of partitioning it into a number of relatively independent subloops. Previous approaches to partitioning non-vectorizable loops were mainly based on iteration pipelining which partitioned a loop based on iteration number and exploited parallelism by overlapping the execution of iterations. However, the amount of parallelism exploited this way is limited because the parallelism inside iterations has been ignored. In this paper, we present a new loop partitioning technique which can exploit both forms of parallelism - inside and across iterations. While inspired by the VLIW approach, our method is designed for more general, asynchronous, MIMD machines. In particular, our schedule takes the cost of communication into account, and attempts to balance it with respect to parallelism. We show our method is correct, efficient, and produces better schedules than previous iteration level approaches

eScholarship - University of California

Recommended from our members

Fine-grain loop scheduling for MIMD machines

Author: Brownhill Carrie J.
Kim Ki-chang
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 02/10/1990
Field of study

Previous algorithms for parallelizing loops on MIMD machines have been based on assigning one or more loop iterations to each processor, introducing synchronization as required. These methods exploit only iteration level parallelism, and ignore the parallelism that may exist at a lower level.In order to exploit parallelism both within and across iterations, our algorithm analyzes and schedules the loop at the statement level. The loop schedule reflects the expected communication and synchronization costs of the target machine. We provide test results that show that this algorithm can produce good speedup of loops on an MIMD machine

eScholarship - University of California