15 research outputs found

    Compiler Generated Systolic Arrays For Wavefront Algorithm Acceleration on FPGAs

    No full text
    Wavefront algorithms, such as the Smith-Waterman algorithm, are commonly used in bioinformatics for exact local and global sequence alignment. These algorithms are highly computationally intensive and are therefore excellent candidates for FPGA-based code acceleration. However, there is no standard form of these algorithms, they are used in a wide variety of situations with various constraints. It is therefore not practical to have a standard kernel that can be mapped to an FPGA, hence the importance of being able to compile such codes from a high level language. ROCCC is a C to VHDL compiler, which optimizes and parallelizes the most frequently executed kernel loops in applications such as in multimedia, scientific and high-performance computing. In this paper we describe the transformations performed by ROCCC, which transformed the kernel of the Smith-Waterman algorithm into a hardware systolic array that is mapped onto the FPGA on the SGI Altix RASC blade. We report a throughput increase by over 3,000X over a 2.8 GHz Xeon. 1

    Input data reuse in compiling window operations onto reconfigurable hardware

    No full text
    Balancing computation with I/O has been considered as a critical factor of the overall performance for embedded systems in general and reconfigurable computing systems in particular. Data I/O often dominates the overall computation performance for window operation, which are frequently used in image processing, image compression, pattern recognition and digital signal processing. This problem is more acute in reconfigurable systems since the compiler must generate the data path and the sequence of operations. The challenge is to intelligently exploit data reuse on the reconfigurable fabric (FPGA) to minimize the required memory or I/O bandwidth while maximizing parallelism. In this paper, we present a compile-time approach to reuse data in window-based codes. The compiler, called ROCCC, first analyzes and optimizes the window operation in C. It then computes the size of the hardware buffer and defines three sets of data values for each window: the window set, the managed set and the killed set. This compile-time analysis simplifies the HDL code generation and improves the resulting hardware performance. We also discuss in-place window operations

    To appear in ACM Transactions on Architecture and Compiler Optimizations. Efficient Hardware Code Generation for FPGAs

    No full text
    The wider acceptance of FPGAs as a computing device requires a higher level of programming abstraction. ROCCC is an optimizing C to HDL compiler. We describe the code generation approach in ROCCC. The smart buffer is a component that reuses input data between adjacent iterations. It significantly improves the performance of the circuit and simplifies loop control. The ROCCC-generated data-path can execute one loop iteration per clock-cycle when there is no loop-dependency or there is only scalar recurrence variable dependency. ROCCC's approach to supporting while-loops operating on scalars makes the compiler able to move scalar iterative computation into hardware

    Efficient Hardware Code Generation for

    No full text
    The wider acceptance of FPGAs as a computing device requires a higher level of programming abstraction. ROCCC is an optimizing C to HDL compiler. We describe the code generation approach in ROCCC. The smart buffer is a component that reuses input data between adjacent iterations. It significantly improves the performance of the circuit and simplifies loop control. The ROCCCgenerated datapath can execute one loop iteration per clock cycle when there is no loop dependency or there is only scalar recurrence variable dependency. ROCCC’s approach to supporting while-loops operating on scalars makes the compiler able to move scalar iterative computation into hardware

    Optimized Generation of Data-path from C Codes for FPGAs

    No full text
    FPGAs, as computing devices, offer significant speedup over microprocessors. Furthermore, their configurability offers an advantage over traditional ASICs. However, they do not yet enjoy high-level language programmability, as microprocessors do. This has become the main obstacle for their wider acceptance by application designers. ROCCC is a compiler designed to generate circuits from C source code to execute on FPGAs, more specifically on CSoCs. It generates RTL level HDLs from frequently executing kernels in an application. In this paper, we describe ROCCC’s system overview and focus on its data path generation. We compare the performance of ROCCCgenerated VHDL code with that of Xilinx IPs. The synthesis result shows that ROCCC-generated circuit takes around 2x ~ 3x area and runs at comparable clock rate. 1

    Impact of Loop Unrolling on Area, Throughput and Clock Frequency in ROCCC: C to VHDL Compiler for FPGAs

    No full text
    Abstract. Loop unrolling is the main compiler technique that allows reconfigurable architectures achieve large degrees of parallelism. However, loop unrolling increases the area and can potentially have a negative impact on clock cycle time. In most embedded applications, the critical parameter is the throughput. Loop unrolling can therefore have contradictory effects on the throughput. As a consequence there exists, in general, a degree of unrolling that maximizes the throughput per unit area. This paper studies the effect of loop unrolling on the area, clock speed and throughput within the ROCCC, C to VHDL compilation framework. Our results indicate that due to the unique design of the ROCCC compilation framework, FPGA area either shrinks or increases at a very low rate for the first few times the loops are unrolled. This reduced area causes the clock cycle time to decrease and thus a great gain in throughput. Our results also show that there are different optimal unrolling factors for different programs.

    Optimized Generation of Data-Path from C Codes for FPGAs

    No full text
    Submitted on behalf of EDAA (http://www.edaa.com/)International audienceFPGAs, as computing devices, offer significant speedup over microprocessors. Furthermore, their configurability offers an advantage over traditional ASICs. However, they do not yet enjoy high-level language programmability, as microprocessors do. This has become the main obstacle for their wider acceptance by application designers. ROCCC is a compiler designed to generate circuits from C source code to execute on FPGAs, more specifically on CSoCs. It generates RTL level HDLs from frequently executing kernels in an application. In this paper, we describe ROCCC's system overview and focus on its data path generation. We compare the performance of ROCCC-generated VHDL code with that of Xilinx IPs. The synthesis result shows that ROCCC-generated circuit takes around 2x ~ 3x area and runs at comparable clock rate

    17 Impact of High-Level Transformations within the ROCCC Framework

    No full text
    Reconfigurable computers, where one or more FPGAs are attached to a conventional microprocessor, are promising platforms for code acceleration. Despite their advantages, programmability concerns and the lack of efficient design tools/compilers for FPGAs are preventing the technology’s widespread adoption. The traditional compiler technology is microprocessor-based-systemsspecific and needs to be customized and augmented to address the needs in reconfigurable computing. The challenges are several due to the resources and performance constraints for FP-GAs being drastically different than those of microprocessors, and also that compiling for FP-GAs requires laying the computation in space by a circuit rather than in time by a sequence of instructions. ROCCC is an optimizing C-to-VHDL compiler specifically targeting the reconfigurable computer platforms. ROCCC includes several high-level optimizations that parallelize and optimize the source code for minimized area and critical path length and maximized throughput. This article presents the effect of ROCCC’s high-level transformations on the performance of the generated VHDL output. ROCCC utilizes: (1) several array access optimizations to eliminate redundant memory accesses, (2) procedure-level optimizations to achieve circuit area reductions of up to 88% compared to circuit areas generated from unoptimized codes, (3) loop-level optimizations to increase the throughput, and (4) transformations unique to certain classes of applications. The preceding listed features help ROCCC generate circuits with very large degrees of parallelism capable of very high computation rates
    corecore