1,748 research outputs found

    Quantifying the latency benefits of near-edge and in-network FPGA acceleration

    Get PDF
    Transmitting data to cloud datacenters in distributed IoT applications introduces significant communication latency, but is often the only feasible solution when source nodes are computationally limited. To address latency concerns, cloudlets, in-network computing, and more capable edge nodes are all being explored as a way of moving processing capability towards the edge of the network. Hardware acceleration using Field Programmable Gate Arrays (FPGAs) is also seeing increased interest due to reduced computation latency and improved efficiency. This paper evaluates the the implications of these offloading approaches using a case study neural network based image classification application, quantifying both the computation and communication latency resulting from different platform choices. We consider communication latency including the ingestion of packets for processing on the target platform, showing that this varies significantly with the choice of platform. We demonstrate that emerging in-network accelerator approaches offer much improved and predictable performance as well as better scaling to support multiple data sources

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    Computing in the blink of an eye: Current possibilities for edge computing and hardware-agnostic programming

    Get PDF
    With the rapid advancements of the internet of things, systems including sensing, communication, and computation become ubiquitous. The systems that are built with these technologies are increasingly complex and therefore require more automation and intelligent decision-making, while often including contact with humans. It is thus critical that such interactions run smoothly in real time, and that the automation strategies do not introduce important delays, usually not larger than 100 milliseconds, as the blink of a human eye. Pushing the deployment of the algorithms on embedded devices closer to where data is collected to avoid delays is one of the main motivations of edge computing. Further advantages of edge computing include improved reliability and data privacy management. This work showcases the possibilities of different embedded platforms that are often used as edge computing nodes: embedded microcontrollers, embedded microprocessors, FPGAs and embedded GPUs. The embedded solutions are compared with respect to their cost, complexity, energy consumption and computing speed establishing valuable guidelines for designers of complex systems that need to make use of edge computing. Furthermore, this paper shows the possibilities of hardware-agnostic programming using OpenCL, illustrating the price to pay in efficiency when software can be easily deployed on different hardware platforms
    • …
    corecore