8 research outputs found

    Design tradeoffs for hard and soft FPGA-based Networks-on-Chip

    Full text link
    FPGAs has the potential not only to improve the efficiency of the interconnect, but also to increase designer productivity and reduce compile time by raising the abstraction level of communication. By comparing NoC components on FPGAs and ASICs we quantify the efficiency gap between the two platforms and use the results to understand the design tradeoffs in that space. The crossbar has the largest FPGA vs. ASIC gaps: 85× area and 4.4 × delay, while the input buffers have the smallest: 17 × area and 2.9 × delay. For a soft NoC router, these results indicate that wide datapaths, deep buffers and a small number of ports and virtual channels (VC) are favorable for FPGA implementation. If one hardens a complete state-of-the-art VC router it is on average 30 × more area efficient and can achieve 3.6 × the maximum frequency of a soft implementation. We show that this hard router can be integrated with the soft FPGA interconnect, and still achieve an area improvement of 22×. A 64-node NoC of hard routers with soft interconnect utilizes area equivalent to 1.6 % of the logic modules in the latest FPGAs, compared to 33 % for a soft NoC. I

    Design Principles for Packet Deparsers on FPGAs

    Get PDF
    The P4 language has drastically changed the networking field as it allows to quickly describe and implement new networking applications. Although a large variety of applications can be described with the P4 language, current programmable switch architectures impose significant constraints on P4 programs. To address this shortcoming, FPGAs have been explored as potential targets for P4 applications. P4 applications are described using three abstractions: a packet parser, match-action tables, and a packet deparser, which reassembles the output packet with the result of the match-action tables. While implementations of packet parsers and match-action tables on FPGAs have been widely covered in the literature, no general design principles have been presented for the packet deparser. Indeed, implementing a high-speed and efficient deparser on FPGAs remains an open issue because it requires a large amount of interconnections and the architecture must be tailored to a P4 program. As a result, in several works where a P4 application is implemented on FPGAs, the deparser consumes a significant proportion of chip resources. Hence, in this paper, we address this issue by presenting design principles for efficient and high-speed deparsers on FPGAs. As an artifact, we introduce a tool that generates an efficient vendor-agnostic deparser architecture from a P4 program.Our design has been validated and simulated with a cocotb-based framework.The resulting architecture is implemented on Xilinx Ultrascale+ FPGAs and supports a throughput of more than 200 Gbps while reducing resource usage by almost 10x compared to other solutions

    The power of communication: Energy-efficient NOCS for FPGAS

    Full text link
    Integrating networks-on-chip (NoCs) on FPGAs can improve device scalability and facilitate design by abstracting com-munication and simplifying timing closure, not only between modules in the FPGA fabric but also with large “hard ” blocks such as high-speed I/O interfaces. We propose mixed and hard NoCs that add less than 1 % area to large FPGAs and run 5-6 × faster than the soft NoC equivalent. A detailed power analysis, per NoC component, shows that routers con-sume 14 × less power when implemented hard compared to soft, and whether hard or soft most of the router’s power is consumed in the input modules for buffering. For com-plete systems, hard NoCs consume less than 6 % (and as low as 3%) of the FPGA’s dynamic power budget to support 100 GB/s of communication bandwidth. We find that, de-pending on design choices, hard NoCs consume 4.5-10.4 mJ of energy per GB of data transferred. Surprisingly, this is comparable to the energy efficiency of the simplest tradi-tional interconnect on an FPGA – soft point-to-point links require 4.7 mJ/GB. In many designs, communication must include multiplexing, arbitration and/or pipelining. For all these cases, our results indicate that a hard NoC will be more energy efficient than the conventional FPGA fabric. 1

    Implementação de um módulo de leitura de ECG abdominal em gestantes para estimativa da frequência cardíaca fetal usando FPGA

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Faculdade UnB Gama, 2017.Na área de biomédica, o monitoramento da frequência cardíaca fetal (do inglês, FHR - Fetal Heart Rate) tem sido determinante para a obtenção de informações significativas acerca das reais condições do bebê dentro da barriga da mãe. Uma das maneiras nãoinvasivas de se estimar a FHR é através do eletrocardiograma fetal (FECG). Eletrodos são posicionados no abdômen materno e o sinal resultante é o ECG abdominal (AECG), que é composto pelo ECG materno (MECG), pelo FECG e por ruído. A partir do processamento do AECG, pode-se extrair o FECG e aplicar algoritmos de estimação para se obter a FHR. Considerando esse contexto, duas propostas foram desenvolvidas como temas de trabalho de conclusão de curso em Engenharia Eletrônica na Faculdade do Gama da Universidade de Brasília. A primeira consistiu na implementação de um módulo estimador da FHR baseado em FPGA, e a segunda, na realização da comunicação sem fio entre o FPGA e um dispositivo móvel com Android, que recebe o valor estimado e emite alarmes quando o mesmo ultrapassa os limites predefinidos. O objetivo deste trabalho é complementar a primeira proposta citada, implementando um módulo de leitura de maneira que vários sinais de AECG fiquem disponíveis para que o FPGA possa realizar o processamento e, posteriormente, enviar os valores da FHR estimada para o dispositivo móvel.In the biomedical field, monitoring the fetal heart rate (FHR) has been proven to provide meaningful information about the actual conditions of the baby inside the womb. One of the non-invasive methods to estimate the FHR is using the fetal electrocardiogram (FECG). Electrodes are placed on the maternal abdomen and the resulting signal is the abdominal ECG (AECG), which is composed by the maternal ECG (MECG), by the FECG and also by noise. The FECG can be extracted by processing the AECG, and the FHR can be obtained by applying estimation algorithms on the FECG. Considering this context, two final course projects were developed at the Faculty of Gama, University of Brasilia. One of them was an FPGA-based FHR estimator module, and the other one was the implementation of a wireless communication between the FPGA and a mobile device with Android operating system, which receives the estimated value and issues an alarm when it exceeds the predefined limits. The objective of this work is to complement the first proposal through the implementation of a reading module in order to provide previously collected AECG signals to the FPGA, so that it can carry out the processing and send the estimated FHR to the mobile device

    HopliteBuf FPGA Network-on-Chip: Architecture and Analysis

    Get PDF
    We can prove occupancy bounds of stall-free FIFOs used in deflection-free, low-cost, and high-speed FPGA overlay Network-on-chips (NoCs). In our work, we build on top of the HopliteRT livelock-free overlay NoC with an FPGA-friendly 2D unidirectional torus topology to propose the novel HopliteBuf NoC. In our new NoC, we strategically introduce stall-free FIFOs in the network and support these FIFOs with static analysis based on network calculus to compute FIFO occupancy, latency, and bandwidth bounds. The microarchitecture of HopliteBuf combines the performance benefits of conventional buffered NoCs (high throughput, low latency) with the cost advantages of deflection-routed NoCs (low FPGA area, high clock frequencies). Specifically, we look at two design variants of the HopliteBuf NoC: (1) Single corner-turn FIFO (W to S), and (2) Dual corner-turn FIFO (W to S+N). The single corner-turn (W to S) design is simpler and only introduces a buffering requirement for packets changing dimension from X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill (W to S) as well as uphill (W to N). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of small increase in resource cost. Essentially, we resolve an analysis challenge with extra hardware resources. Across a range of 100 synthetically-generated workloads on a 5 x 5 NoC, HopliteBuf outperforms HopliteRT by 1.2-2x in terms of latency, 10% in terms of injection rate, and 30-60% in terms of flowset feasibiliy. These advantages come at the cost of 3-4x higher FPGA resource requirement for buffers and muxes. Our analysis also deliver latency bounds that are not only better than HopliteRT in absolute terms but also tighter by 2-3x allowing us to provision less hardware to meet our specifications

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs
    corecore