2 research outputs found

    Power-Optimal Mapping of CNN Applications to Cloud-Based Multi-FPGA Platforms

    Get PDF
    Multi-FPGA platforms like Amazon Web Services F1 are perfect to accelerate multi-kernel pipelined applications, like Convolutional Neural Networks (CNNs). To reduce energy consumption, we propose to upload at runtime the best power-optimized CNN implementation for a given throughput constraint. Our design method gives the best number of parallel instances of each kernel, their allocation to the FPGAs, the number of powered-on FPGAs and their clock frequency. This is obtained by solving a mixed-integer, non-linear optimization problem that models power and performance of each component, as well as the duration of the computation phases—data transfer between a host CPU and the FPGA memory (typically DDR), data transfer between DDR and FPGA, and FPGA computation. The results show that the power saved compared to simply clock gating the fastest implementation is obviously very high, but it is also much more significant than simply scaling the frequency of the fastest implementation or replicating the slowest implementation on multiple FPGAs

    Energy-Aware Real-time Tasks Processing for FPGA Based Heterogeneous Cloud

    Get PDF
    Cloud computing is becoming an popular model of computing. Due to the increasing complexity of the cloud service requests, it often exploits heterogeneous architecture. Moreover, some service requests (SRs)/tasks exhibit real-time features, which are required to be handled within a specified duration. Along with the stipulated temporal management, the strategy should also be energy efficient, as energy consumption in cloud computing is challenging. In this paper, we have proposed a strategy, called ``Efficient Resource Allocation of Service Request" (ERASER) for energy efficient allocation and scheduling of periodic real-time SRs on cloud platform. Our target cloud platform is consist of Field Programmable Gate Arrays (FPGAs) as Processing Elements (PEs) along with the General Purpose Processors (GPP). We have further proposed, a SR migration technique to service maximum SRs. Simulation based experimental results demonstrate that the proposed methodology is capable to achieve upto 90% resource utilization with only 26% SR rejection rate over different experimental scenarios. Comparison results with other state-of-the-art techniques reveal that the proposed strategy outperforms the existing technique with 17% reduction in SR rejection rate and 21% less energy consumption. Further, the simulation outcomes have been validated on a real test-bed based on Xilinx Zynq SoC with benchmark tasks
    corecore