Search CORE

25 research outputs found

Compiling a High-Level Directive-Based Programming Model for GPGPUs,

Author: Barbara Chapman
Rengan Xu
Sunita Chandrasekaran
Xiaonan Tian
Yonghong Yan
Zhifeng Yun
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. OpenACC is an emerging directive-based programming model for programming accelerators that typically enable non-expert programmers to achieve portable and productive performance of their applications. In this paper, we present the research and development challenges, and our solutions to create an open-source OpenACC compiler in a main stream compiler framework (OpenUH of a branch of Open64). We discuss in details our loop mapping techniques, i.e. how to distribute loop iterations over the GPGPU's threading architectures, as well as their impacts on performance. The runtime support of this programming model are also presented. The compiler was evaluated with several commonly used benchmarks, and delivered similar performance to those obtained using a commercial compiler. We hope this implementation to serve as compiler infrastructure for researchers to explore advanced compiler techniques, to extend OpenACC to other programming languages, or to build performance tools used with OpenACC programs

CiteSeerX

Tools and algorithms for high-level algorithm mapping to FPGA

Author: Sunita Chandrasekaran.
Publication venue
Publication date: 01/01/2012
Field of study

Field Programmable Gate Array (FPGA) provides the ability to use, and re-use, hardware with minimal re-development time. This has made the FPGA market more and more competitive. With the appropriate coding style, optimization techniques and design flow, embedded designers can get efficient results, but much faster than other hardware technologies. The popularity of FPGAs has grown such that these devices are even entering the high performance computing (HPC) domain. However, as the capabilities of FPGA hardware improve, the design complexity to program this device also increases. The motivation of this thesis is to bridge the gap between the emerging FPGA hardware capability and the supporting software infrastructure. The current programming methodologies are explored for FPGAs, including the various techniques to raise the level of abstraction above register transfer level (RTL). However, it is a challenge to reduce the amount of manual programming necessary and still be able to achieve the required performance and speedup, let alone provide some improvement. While going from high level language (HLL) code to hardware, it is important to remember that the HLL code is basically written for a processor, meant to execute instructions sequentially. However, an FPGA usually runs at a slower frequency than a processor, and achieves its advantage by the parallelism inherent in the hardware. Moreover the sequential programming paradigm has no concept of clocks, but this is one of the most important features of a hardware device, like FPGA. In order to address these issues, a design methodology (C2FPGA) has been designed and developed. This methodology is used to generate hardware code without the need for an in-depth knowledge of a hardware descriptive language (HDL). An application’s original source code is considered as the input to our framework. The code is profiled to identify the bottlenecks in the program. The framework is integrated with an open source compiler in order to exploit the compiler-based loop level transformation techniques suitable for FPGAs. A data dependency check is performed on the input code in order to develop a parallelizing pattern for the application under consideration. To address the clocking issue, a scheduling algorithm is employed that executes a number of processing elements (PE) in parallel to exploit the parallel processing capability of FPGAs. The data dependencies are displayed with the scheduled time steps between the PEs in a graphical format, thus providing an appropriate pattern to map to FPGA. Finally a complete wrapper is built with all the necessary hardware files and then a netlist is generated for the target device, i.e. the FPGA. This approach has been validated by performing several experiments on applications from different domains. Some promising results have been obtained and the capabilities of the proposed tool and its methodology have been demonstrated to address some of the programmability and productivity issues, effectively and efficiently.Doctor of Philosoph

DR-NTU (Digital Repository of NTU)