14 research outputs found

    Energy-Efficient Matrix Multiplication on FPGAs

    No full text
    We develop new algorithms and architectures for matrix multiplication on configurable devices. These designs significantly reduce the energy dissipation and latency compared with the state-of-the-art FPGA-based designs. We derive functions to represent the impact of algorithmic level design choices on the system-wide energy dissipation, latency, and area by capturing algorithm and architecture details including features of the target FPGA. The functions are used to optimize energy performance under latency and area constraints for a family of candidate algorithms and architectures. As a result, our designs improve the energy performance of the optimized design from the recent Xilinx library by 32% to 88% without any increase in area-latency product. In terms of comprehensive metrics such as EAT (Energy-Area-Time) and E/AT (Energy/Area-Time), our designs offer superior performance compared with the Xilinx design by 50%-79% and 13%-44%, respectively. We also address how to exploit further increases in density of future FPGA devices for asymptotic improvement in latency and energy dissipation for multiplication of larger size matrices

    General Terms

    No full text
    In this paper, we present techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform (FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in performing these tasks. Using a Xilinx Virtex-II as the target FPGA, we compare the performance of our designs to those from the Xilinx library as well as to conventional algorithms run on the PowerPC core embedded in the Virtex-II Pro and the Texas Instruments TMS320C6415. Our evaluations are done both through estimation based on energy and latency equations and through low-level simulation. For FFT, our designs dissipated an average of 60 % less energy than the design from the Xilinx library and 56 % less than the DSP. Our designs showed a factor of 10 improvement over the embedded processor. These results provide concrete evidence to substantiate the idea that FPGAs can outperform DSPs and embedded processors in signal processing. Further, they show that FPGAs can achieve this performance while still dissipating less energy than the other two types of devices

    Performance Modeling of Reconfigurable SoC Architectures and Energy-Efficient Mapping of a Class of Applications

    No full text
    Reconfigurable System-on-Chip (RSoC) devices are being used to implement many battery operated systems, where energy efficiency is a major concern. RSoCs incorporate many different components, such as processor core, reconfigurable logic, memory, etc. Various power management techniques can be applied to these components. Tasks within an application can be mapped onto different components for execution. The communication and reconfiguration costs incurred under different mappings significantly impact the overall system energy dissipation. In order to achieve energy-efficient designs on RSoCs, we develop (a) a performance model to abstract a general class of RSoC architectures for application development, (b) a mathematical formulation of the energy-efficient mapping problem for a class of applications, and (c) a dynamic programming algorithm that minimizes the system energy dissipation. We illustrate our approach by mapping two beamforming applications onto Xilinx Virtex-II Pro. For these two applications, our approach leads to an average 52% energy reduction over a greedy algorithm
    corecore