Search CORE

6 research outputs found

Power-Optimal Mapping of CNN Applications to Cloud-Based Multi-FPGA Platforms

Author: Casu Mario R.
Cortadella Jordi
Lavagno Luciano
Lazarescu Mihai T.
Shan Junnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Multi-FPGA platforms like Amazon Web Services F1 are perfect to accelerate multi-kernel pipelined applications, like Convolutional Neural Networks (CNNs). To reduce energy consumption, we propose to upload at runtime the best power-optimized CNN implementation for a given throughput constraint. Our design method gives the best number of parallel instances of each kernel, their allocation to the FPGAs, the number of powered-on FPGAs and their clock frequency. This is obtained by solving a mixed-integer, non-linear optimization problem that models power and performance of each component, as well as the duration of the computation phases—data transfer between a host CPU and the FPGA memory (typically DDR), data transfer between DDR and FPGA, and FPGA computation. The results show that the power saved compared to simply clock gating the fastest implementation is obviously very high, but it is also much more significant than simply scaling the frequency of the fastest implementation or replicating the slowest implementation on multiple FPGAs

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Multi-Tenant Cloud FPGA: A Survey on Security

Author: Ahmed Muhammed Kawser
Bobda Christophe
Mandebi Joel
Saha Sujan Kumar
Publication venue
Publication date: 22/09/2022
Field of study

With the exponentially increasing demand for performance and scalability in cloud applications and systems, data center architectures evolved to integrate heterogeneous computing fabrics that leverage CPUs, GPUs, and FPGAs. FPGAs differ from traditional processing platforms such as CPUs and GPUs in that they are reconfigurable at run-time, providing increased and customized performance, flexibility, and acceleration. FPGAs can perform large-scale search optimization, acceleration, and signal processing tasks compared with power, latency, and processing speed. Many public cloud provider giants, including Amazon, Huawei, Microsoft, Alibaba, etc., have already started integrating FPGA-based cloud acceleration services. While FPGAs in cloud applications enable customized acceleration with low power consumption, it also incurs new security challenges that still need to be reviewed. Allowing cloud users to reconfigure the hardware design after deployment could open the backdoors for malicious attackers, potentially putting the cloud platform at risk. Considering security risks, public cloud providers still don't offer multi-tenant FPGA services. This paper analyzes the security concerns of multi-tenant cloud FPGAs, gives a thorough description of the security problems associated with them, and discusses upcoming future challenges in this field of study

arXiv.org e-Print Archive

Interconnection Networks for High-Performance Stream Computing with FPGA Clusters

Author: Mondigo Antoniette Pangilinan
Publication venue
Publication date: 25/03/2020
Field of study

Tohoku University佐野健太郎課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Institutional Repositories DataBase (IRDB)

Dynamic Optical Networks for Data Centres and Media Production

Author: Funnell Adam Christopher
Publication venue: UCL (University College London)
Publication date: 28/07/2019
Field of study

This thesis explores all-optical networks for data centres, with a particular focus on network designs for live media production. A design for an all-optical data centre network is presented, with experimental verification of the feasibility of the network data plane. The design uses fast tunable (< 200 ns) lasers and coherent receivers across a passive optical star coupler core, forming a network capable of reaching over 1000 nodes. Experimental transmission of 25 Gb/s data across the network core, with combined wavelength switching and time division multiplexing (WS-TDM), is demonstrated. Enhancements to laser tuning time via current pre-emphasis are discussed, including experimental demonstration of fast wavelength switching (< 35 ns) of a single laser between all combinations of 96 wavelengths spaced at 50 GHz over a range wider than the optical C-band. Methods of increasing the overall network throughput by using a higher complexity modulation format are also described, along with designs for line codes to enable pulse amplitude modulation across the WS-TDM network core. The construction of an optical star coupler network core is investigated, by evaluating methods of constructing large star couplers from smaller optical coupler components. By using optical circuit switches to rearrange star coupler connectivity, the network can be partitioned, creating independent reserves of bandwidth and resulting in increased overall network throughput. Several topologies for constructing a star from optical couplers are compared, and algorithms for optimum construction methods are presented. All of the designs target strict criteria for the flexible and dynamic creation of multicast groups, which will enable future live media production workflows in data centres. The data throughput performance of the network designs is simulated under synthetic and practical media production traffic scenarios, showing improved throughput when reconfigurable star couplers are used compared to a single large star. An energy consumption evaluation shows reduced network power consumption compared to incumbent and other proposed data centre network technologies

UCL Discovery

Towards hardware as a reconfigurable, elastic, and specialized service

Author: Sanaullah Ahmed
Publication venue
Publication date: 29/09/2019
Field of study

As modern Data Center workloads become increasingly complex, constrained, and critical, mainstream CPU-centric computing has had ever more difficulty in keeping pace. Future data centers are moving towards a more fluid and heterogeneous model, with computation and communication no longer localized to commodity CPUs and routers. Next generation data-centric Data Centers will compute everywhere, whether data is stationary (e.g. in memory) or on the move (e.g. in network). While deploying FPGAs in NICS, as co-processors, in the router, and in Bump-in-the-Wire configurations is a step towards implementing the data-centric model, it is only part of the overall solution. The other part is actually leveraging this reconfigurable hardware. For this to happen, two problems must be addressed: code generation and deployment generation. By code generation we mean transforming abstract representations of an algorithm into equivalent hardware. Deployment generation refers to the runtime support needed to facilitate the execution of this hardware on an FPGA. Efforts at creating supporting tools in these two areas have thus far provided limited benefits. This is because the efforts are limited in one or more of the following ways: They i) do not provide fundamental solutions to a number of challenges, which makes them useful only to a limited group of (mostly) hardware developers, ii) are constrained in their scope, or iii) are ad hoc, i.e., specific to a single usage context, FPGA vendor, or Data Center configuration. Moreover, efforts in these areas have largely been mutually exclusive, which results in incompatibility across development layers; this requires wrappers to be designed to make interfaces compatible. As a result there is significant complexity and effort required to code and deploy efficient custom hardware for FPGAs; effort that may be orders-of-magnitude greater than for analogous software environments. The goal of this dissertation is to create a framework that enables reconfigurable logic in Data Centers to be targeted with the same level of effort as for a single CPU core. The underlying mechanism to this is a framework, which we refer to as Hardware as a Reconfigurable, Elastic and Specialized Service, or HaaRNESS. In this dissertation, we address two of the core challenges of HaaRNESS: reducing the complexity of code generation by constraining High Level Synthesis (HLS) toolflows, and replacing ad hoc models of deployment generation by generalizing and formalizing what is needed for a hardware Operating System. These parts are unified by the back-end of HLS toolflows which link generated compute pipelines with the operating system, and provide appropriate APIs, wrappers, and software runtimes. The contributions of this dissertation are the following: i) an empirically guided set of systematic transformations for generating high quality HLS code; ii) a framework for instrumenting HLS compiler to identify and remove optimization blockers; iii) a framework for RTL simulation and IP generation of HLS kernels for rapid turnaround; and iv) a framework for generalization and formalization of hardware operating systems to address the {\it ad hoc}'ness of existing deployment generation and ensure uniform structure and APIs

Boston University Institutional Repository (OpenBU)