Search CORE

172 research outputs found

Coordinated parallelizing compiler optimizations and high-level synthesis

Author: Aho A.
Alexandru Nicolau
Bergamaschi R.
Chaiyakul V.
Ebcioglu K.
Fisher J.
Gupta S.
Gupta S.
Gupta S.
Gupta S.
Gupta S.
Gupta S.
Iqbal Z.
Janssen M.
Kountouris A.
Ku D.
Li J.
Lobo D.
Nicolau A.
Nikil D. Dutt
Novack S.
Orailoglu A.
Peymandoust A.
Potkonjak M.
Rajesh Kumar Gupta
Sreedhar V.
Sumit Gupta
Wakabayashi K.
Wakabayashi K.
Walker R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2004
Field of study

We present a high-level synthesis methodology that applies a coordinated set of coarse-grain and fine-grain parallelizing transformations. The transformations are applied both during a presynthesis phase and during scheduling, with the objective of optimizing the results of synthesis and reducing the impact of control flow constructs on the quality of results. We first apply a set of source level presynthesis transformations that include common sub-expression elimination (CSE), copy propagation, dead code elimination and loop-invariant code motion, along with more coarse-level code restructuring transformations such as loop unrolling. We then explore scheduling techniques that use a set of aggressive speculative code motions to maximally parallelize the design by re-ordering, speculating and sometimes even duplicating operations in the design. In particular, we present a new technique called "Dynamic CSE" that dynamically coordinates CSE and code motions such as speculation and conditional speculation during scheduling. We implemented our parallelizing high-level synthesis in the SPARK framework. This framework takes a behavioral description in ANSI-C as input and generates synthesizable register-transfer level VHDL. Our results from computationally expensive portions of three moderately complex design targets, namely, MPEG-1, MPEG-2 and the GIMP image processing too], validate the utility of our approach to the behavioral synthesis of designs with complex control flows

Crossref

eScholarship - University of California

Transformations of High-Level Synthesis Codes for High-Performance Computing

Author: Besta Maciej
Hoefler Torsten
Licht Johannes de Fine
Meierhans Simon
Publication venue
Publication date: 29/10/2019
Field of study

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

arXiv.org e-Print Archive

Repository for Publications and Research Data

High-Level Synthesis for Embedded Systems

Author: Michael Dossis
Publication venue: 'IntechOpen'
Publication date: 02/03/2012
Field of study

IntechOpen

Implementation of optimizing transformations in a high leel synthesis compiler

Author: Τσακυρίδης Απόστολος
Χατζηαναστασίου Γεώργιος
Publication venue
Publication date: 01/01/2016
Field of study

University of Thessaly Institutional Repository

On Extracting Course-Grained Function Parallelism from C Programs

Author: Li Chien-Wei
Publication venue
Publication date: 01/05/2006
Field of study

To efficiently utilize the emerging heterogeneous multi-core architecture, it is essential to exploit the inherent coarse-grained parallelism in applications. In addition to data parallelism, applications like telecommunication, multimedia, and gaming can also benefit from the exploitation of coarse-grained function parallelism. To exploit coarse-grained function parallelism, the common wisdom is to rely on programmers to explicitly express the coarse-grained data-flow between coarse-grained functions using data-flow or streaming languages. This research is set to explore another approach to exploiting coarse-grained function parallelism, that is to rely on compiler to extract coarse-grained data-flow from imperative programs. We believe imperative languages and the von Neumann programming model will still be the dominating programming languages programming model in the future. This dissertation discusses the design and implementation of a memory data-flow analysis system which extracts coarse-grained data-flow from C programs. The memory data-flow analysis system partitions a C program into a hierarchy of program regions. It then traverses the program region hierarchy from bottom up, summarizing the exposed memory access patterns for each program region, meanwhile deriving a conservative producer-consumer relations between program regions. An ensuing top-down traversal of the program region hierarchy will refine the producer-consumer relations by pruning spurious relations. We built an in-lining based prototype of the memory data-flow analysis system on top of the IMPACT compiler infrastructure. We applied the prototype to analyze the memory data-flow of several MediaBench programs. The experiment results showed that while the prototype performed reasonably well for the tested programs, the in-lining based implementation may not efficient for larger programs. Also, there is still room in improving the effectiveness of the memory data-flow analysis system. We did root cause analysis for the inaccuracy in the memory data-flow analysis results, which provided us insights on how to improve the memory data-flow analysis system in the future

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Galois : a system for parallel execution of irregular algorithms

Author: Nguyen Donald Do
Publication venue
Publication date: 04/09/2015
Field of study

textA programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.Computer Science

Texas ScholarWorks

Abstraction Raising in General-Purpose Compilers

Author: Chelini Lorenzo
Publication venue: Eindhoven University of Technology
Publication date: 31/08/2021
Field of study

Pure OAI Repository

Programming Language Tools and Techniques for 3D Printing

Author: Caspi Anat
Grossman Dan
Nandi Chandrakana
Tatlock Zachary
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 2nd Summit on Advances in Programming Languages (SNAPL 2017)
Publication date: 01/01/2017
Field of study

We propose a research agenda to investigate programming language techniques for improving affordable, end-user desktop manufacturing processes such as 3D printing. Our goal is to adapt programming languages tools and extend the decades of research in industrial, high-end CAD/CAM in order to help make affordable desktop manufacturing processes more accurate, fast, reliable, and accessible to end-users. We focus on three major areas where 3D printing can benefit from programming language tools: design synthesis, optimizing compilation, and runtime monitoring. We present preliminary results on synthesizing editable CAD models from difficult-to-edit surface meshes, discuss potential new compilation strategies, and propose runtime monitoring techniques. We conclude by discussing additional near-future directions we intend to pursue

Dagstuhl Research Online Publication Server