Search CORE

67 research outputs found

The Fortran parallel transformer and its programming environment

Author: D'Hollander Erik
WANG Q
ZHANG FB
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

Partitioning loops with variable dependence distances

Author: D'Hollander Erik
Yu Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

A new technique to parallelize loops,vith variable distance vectors is presented The method extends previous methods in two ways. First, the present method makes it possible for array subscripts to be any linear combination of all loop indices. The solutions to the linear dependence equations established from such army subscripts are characterized by a pseudo distance matrix(PDM). Second, it allows us to exploit loop parallelism from the PDM by applying unimodular and partitioning transformations that preserve the lexicographical order of the dependent iterations. The algorithms to derive the PDM, to find a suitable loop transformation and to generate parallel code are described showing that it is possible to parallelize a wider range of loops automatically

Ghent University Academic Bibliography

High performance computing with FPGAs

Author: Beyls Kristof
D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2009
Field of study

Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors

Ghent University Academic Bibliography

Non-uniform dependences partitioned by recurrence chains

Author: D'Hollander E. H.
Yu Yijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

Non-uniform distance loop dependences are a known obstacle to find parallel iterations. To find the outermost loop parallelism in these ï¿½irregularï¿½ loops, a novel method is presented based on recurrence chains. The scheme organizes non-uniformly dependent iterations into lexicographically ordered monotonic chains. While the initial and final iteration of monotonic chains form two parallel sets, the remaining iterations form an intermediate set that can be partitioned further. When there is only one pair of coupled array references, the non-uniform dependences are represented by a single recurrence equation. In that case, the chains in the intermediate set do not bifurcate and each can be executed as a WHILE loop. The independent iterations and the initial iterations of monotonic dependence chains constitute the outermost parallelism. The proposed approach compares favorably with other treatments of nonuniform dependences in the literature. When there are multiple recurrence equations, a dataflow parallel execution can be scheduled using the technique extensively to find maximum loop parallelism

Ghent University Academic Bibliography

Open Research Online (The Open University)

Interactive programming using PEFPT

Author: D'Hollander Erik
Wang Q
Yijun Y
Publication venue
Publication date: 01/01/1996
Field of study

Ghent University Academic Bibliography

Compiler Optimization Techniques for Scheduling and Reducing Overhead

Author: Wang Tong-Chai
Publication venue: LSU Digital Commons
Publication date: 01/01/2006
Field of study

Exploiting parallelism in loops in programs is an important factor in realizing the potential performance of processors today. This dissertation develops and evaluates several compiler optimizations aimed at improving the performance of loops on processors. An important feature of a class of scientific computing problems is the regularity exhibited by their access patterns. Chapter 2 presents an approach of optimizing the address generation of these problems that results in the following: (i) elimination of redundant arithmetic computation by recognizing and exploiting the presence of common sub-expressions across different iterations in stencil codes; and (ii) conversion of as many array references to scalar accesses as possible, which leads to reduced execution time, decrease in address arithmetic overhead, access to data in registers as opposed to caches, etc. With the advent of VLIW processors, the exploitation of fine-grain instruction-level parallelism has become a major challenge to optimizing compilers. Fine-grain scheduling of inner loops has received a lot of attention, little work has been done in the area of applying it to nested loops. Chapter 3 presents an approach to fine-grain scheduling of nested loops by formulating the problem of finding theminimum iteration initiation interval as one of finding a rational affine schedule for each statement in the body of a perfectly nested loop which is then solved using linear programming. Frequent synchronization on multiprocessors is expensive due to its high cost. Chapter 4 presents a method for eliminating redundant synchronization for nested loops. In nested loops, a dependence may be redundant in only a portion of the iteration space. A characterization of the non-uniformity of the redundancy of a dependence is developed in terms of the relation between the dependences and the shape and size of the iteration space. Exploiting locality is critical for achieving high level of performance on a parallel machine. Chapter 5 presents an approach using the concept of affinity regions to find transformations such that a suitable iteration-to-processor mapping can be found for a sequence of loop nests accessing shared arrays. This not only improves the data locality but significantly reduces communication overhead

Louisiana State University

On Characterizing the Data Access Complexity of Programs

Author: Bilardi G.
Bilardi G.
Elango V.
Fuller S. H.
Savage J. E.
Scquizzato M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/11/2014
Field of study

Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong & Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Software refactoring guided by multiple soft-goals

Author: D'Hollander Erik
Leite JC
Liu Linda
Mylopoulos John
Yu Eric
Yu Yijun
Publication venue
Publication date: 01/01/2003
Field of study

Software refactoring is intended to enhance the quality of a software by improving its understandability, performance, as well as other quality attributes. We adopt the modelling framework of [14] in order to analyze software qualities, to determine which software refactoring transformations are most appropriate. In addition, we use software metrics to evaluate software quality quantitatively. Our framework adopts and extends work reported in [15]

Ghent University Academic Bibliography

Open Research Online (The Open University)

A theoretical foundation for program transformations to reduce cache thrashing due to true data sharing

Author: Chen Fujie
Jin Guohua
Li Zhiyuan
Publication venue: Elsevier Science B.V.
Publication date: 28/03/2001
Field of study

AbstractCache thrashing due to true data sharing can degrade the performance of parallel programs significantly. Our previous work showed that parallel task alignment via program transformations can be quite effective for the reduction of such cache thrashing. In this paper, we present a theoretical foundation for such program transformations. Based on linear algebra and the theory of numbers, our work analyzes the data dependences among the tasks created by a fork-join parallel program and determines at compile time how these tasks should be assigned to processors in order to reduce cache thrashing due to true data sharing. Our analysis and program transformations can be easily performed by compilers for parallel computers

Elsevier - Publisher Connector