Search CORE

36 research outputs found

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers

Author: Das Raja
Hwang Yuan-Shin
Ponnusamy Ravi
Saltz Joel
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1994
Field of study

This paper presents methods that make it possible to efficiently support irregular problems using data parallel languages. The approach involves the use of a portable, compiler-independent, runtime support library called CHAOS. The CHAOS runtime support library contains procedures that (1) support static and dynamic distributed array partitioning, (2) partition loop iterations and indirection arrays, (3) remap arrays from one distribution to another, and (4) carry out index translation, buffer allocation and communication schedule generation. The CHAOS runtime procedures are used by a prototype Fortran 90D compiler as runtime support for irregular problems. This paper also presents performance results of compiler-generated and hand-parallelized versions of two stripped down applications codes. The first code is derived from an unstructured mesh computational fluid dynamics flow solver and the second is derived from the molecular dynamics code CHARMM. A method is described that makes it possible to emulate irregular distributions in HPF by reordering elements of data arrays and renumbering indirection arrays. The results suggest that HPF compiler could use reordering and renumbering extrinsic functions to obtain performance comparable to that achieved by a compiler for a language (such as Fortran 90D) that directly supports irregular distributions

Syracuse University Research Facility and Collaborative Environment

Supporting Irregular Distributions Using Data-Parallel Languages

Author: Choudhary Alok
Das Raja
Fox Geoffrey
Hwang Yuan-Shin
Ponnusamy Ravi
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1995
Field of study

Languages such as Fortran D provide irregular distribution schemes that can efficiently support irregular problems. Irregular distributions can also be emulated in HPF. Compilers can incorporate runtime procedures to automatically support these distributions

Syracuse University Research Facility and Collaborative Environment

Code-Optimierung im Polyedermodell - Effizienzsteigerung von parallelen Schleifensätzen

Author: Faber Peter
Publication venue
Publication date: 21/10/2008
Field of study

A safe basis for automatic loop parallelization is the polyhedron model which represents the iteration domain of a loop nest as a polyhedron in

\mathbb{Z}^n

. However, turning the parallel loop program in the model to efficient code meets with several obstacles, due to which performance may deteriorate seriously -- especially on distributed memory architectures. We introduce a fine-grained model of the computation performed and show how this model can be applied to create efficient code

Emerging trends proceedings of the 17th International Conference on Theorem Proving in Higher Order Logics: TPHOLs 2004

Author: Slind Konrad Lee
Publication venue: University of Utah
Publication date: 01/08/2004
Field of study

technical reportThis volume constitutes the proceedings of the Emerging Trends track of the 17th International Conference on Theorem Proving in Higher Order Logics (TPHOLs 2004) held September 14-17, 2004 in Park City, Utah, USA. The TPHOLs conference covers all aspects of theorem proving in higher order logics as well as related topics in theorem proving and verification. There were 42 papers submitted to TPHOLs 2004 in the full research cate- gory, each of which was refereed by at least 3 reviewers selected by the program committee. Of these submissions, 21 were accepted for presentation at the con- ference and publication in volume 3223 of Springer?s Lecture Notes in Computer Science series. In keeping with longstanding tradition, TPHOLs 2004 also offered a venue for the presentation of work in progress, where researchers invite discussion by means of a brief introductory talk and then discuss their work at a poster session. The work-in-progress papers are held in this volume, which is published as a 2004 technical report of the School of Computing at the University of Utah

The University of Utah: J. Willard Marriott Digital Library

Understanding retargeting compilation techniques for network processors

Author: Li Jun
Publication venue
Publication date: 01/01/2003
Field of study

Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

Dépôt Institutionnel Numérique

On expressing different concurrency paradigms on virtual execution environment

Author: DITTAMO CRISTIAN
Publication venue: 'Pisa University Press'
Publication date: 29/11/2010
Field of study

Virtual execution environments (VEE) such as the Java Virtual Machine (JVM) and the Microsoft Common Language Runtime (CLR) have been designed when the dominant computer architecture featured a Von-Neumann interface to programs: a single processor hiding all the complexity of parallel computations inside its design. Programs are expressed in an intermediate form that is executed by the VEE that defines an abstract computational model in which the concurrency model has been influenced by these design choices and it basically exposes the multi-threading model of the underlying operating system. Recently computer systems have introduced computational units in which concurrency is explicit and under program control. Relevant examples are the Graphical Processing Units (GPU such as Nvidia or AMD) and the Cell BE architecture which allow for explicit control of single processing unit, local memories and communication channels. Unfortunately programs designed for Virtual Machines cannot access to these resources since are not available through the abstractions provided by the VEE. A major redesign of VEEs seems to be necessary in order to bridge this gap. In this thesis we study the problem of exposing non-von Neumann computing resources within the Virtual Machine without need for a redesign of the whole execution infrastructure. In this work we express parallel computations relying on extensible meta-data and reflection to encode information. Meta-programming techniques are then used to rewrite the program into an equivalent one using the special purpose underlying architecture. We provide a case study in which this approach is applied to compiling Common Intermediate Language (CIL) methods to multi-core GPUs; we show that it is possible to access these non-standard computing resources without any change to the virtual machine design

Electronic Thesis and Dissertation Archive - Università di Pisa

Data layout types : a type-based approach to automatic data layout transformations for improved SIMD vectorisation

Author: Šinkarovs Artjoms
Publication venue: Mathematical and Computer Sciences
Publication date: 01/08/2015
Field of study

The increasing complexity of modern hardware requires sophisticated programming techniques for programs to run efficiently. At the same time, increased power of modern hardware enables more advanced analyses to be included in compilers. This thesis focuses on one particular optimisation technique that improves utilisation of vector units. The foundation of this technique is the ability to chose memory mappings for data structures of a given program. Usually programming languages use a fixed layout for logical data structures in physical memory. Such a static mapping often has a negative effect on usability of vector units. In this thesis we consider a compiler for a programming language that allows every data structure in a program to have its own data layout. We make sure that data layouts across the program are sound, and most importantly we solve a problem of automatic data layout reconstruction. To consistently do this, we formulate this as a type inference problem, where type encodes a data layout for a given structure as well as implied program transformations. We prove that type-implied transformations preserve semantics of the original programs and we demonstrate significant performance improvements when targeting SIMD-capable architectures

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Doctor of Philosophy

Author: Earl Christopher
Publication venue: University of Utah
Publication date: 01/05/2014
Field of study

dissertationIn the static analysis of functional programs, control- ow analysis (k-CFA) is a classic method of approximating program behavior as a infinite state automata. CFA2 and abstract garbage collection are two recent, yet orthogonal improvements, on k-CFA. CFA2 approximates program behavior as a pushdown system, using summarization for the stack. CFA2 can accurately approximate arbitrarily-deep recursive function calls, whereas k-CFA cannot. Abstract garbage collection removes unreachable values from the store/heap. If unreachable values are not removed from a static analysis, they can become reachable again, which pollutes the final analysis and makes it less precise. Unfortunately, as these two techniques were originally formulated, they are incompatible. CFA2's summarization technique for managing the stack obscures the stack such that abstract garbage collection is unable to examine the stack for reachable values. This dissertation presents introspective pushdown control-flow analysis, which manages the stack explicitly through stack changes (pushes and pops). Because this analysis is able to examine the stack by how it has changed, abstract garbage collection is able to examine the stack for reachable values. Thus, introspective pushdown control-flow analysis merges successfully the benefits of CFA2 and abstract garbage collection to create a more precise static analysis. Additionally, the high-performance computing community has viewed functional programming techniques and tools as lacking the efficiency necessary for their applications. Nebo is a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena. For efficient execution, Nebo exploits a version of expression templates, based on the C++ template system, which is a type-less, completely-pure, Turing-complete functional language with burdensome syntax. Nebo's declarative syntax supports functional tools, such as point-wise lifting of complex expressions and functional composition of stencil operators. Nebo's primary abstraction is mathematical assignment, which separates what a calculation does from how that calculation is executed. Currently Nebo supports single-core execution, multicore (thread-based) parallel execution, and GPU execution. With single-core execution, Nebo performs on par with the loops and code that it replaces in Wasatch, a pre-existing high-performance simulation project. With multicore (thread-based) execution, Nebo can linearly scale (with roughly 90% efficiency) up to 6 processors, compared to its single-core execution. Moreover, Nebo's GPU execution can be up to 37x faster than its single-core execution. Finally, Wasatch (the pre-existing high-performance simulation project which uses Nebo) can scale up to 262K cores

The University of Utah: J. Willard Marriott Digital Library

Indexed dependence metadata and its applications in software performance optimisation

Author: Howes Lee William
Howes Lee William
Publication venue: Computing, Imperial College London
Publication date: 01/04/2010
Field of study

To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

Spiral - Imperial College Digital Repository