4,300 research outputs found
Recommended from our members
Pipelining of register transfer netlists
This paper describes a method for pipelining of register-to-register netlists. We define algorithms for inserting latches in a data path, both inside each unit and between the units as well as between control logic and the data path and for readjusting the state transition table. Experimental results on several benchmarks show 30%-40% improvement in performance
High-level synthesis optimization for blocked floating-point matrix multiplication
In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An\ exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements
Using ACL2 to Verify Loop Pipelining in Behavioral Synthesis
Behavioral synthesis involves compiling an Electronic System-Level (ESL)
design into its Register-Transfer Level (RTL) implementation. Loop pipelining
is one of the most critical and complex transformations employed in behavioral
synthesis. Certifying the loop pipelining algorithm is challenging because
there is a huge semantic gap between the input sequential design and the output
pipelined implementation making it infeasible to verify their equivalence with
automated sequential equivalence checking techniques. We discuss our ongoing
effort using ACL2 to certify loop pipelining transformation. The completion of
the proof is work in progress. However, some of the insights developed so far
may already be of value to the ACL2 community. In particular, we discuss the
key invariant we formalized, which is very different from that used in most
pipeline proofs. We discuss the needs for this invariant, its formalization in
ACL2, and our envisioned proof using the invariant. We also discuss some
trade-offs, challenges, and insights developed in course of the project.Comment: In Proceedings ACL2 2014, arXiv:1406.123
- …