8 research outputs found

    JetStream : an open-source high-performance PCI express 3 streaming library for FPGA-to-host and FPGA-to-FPGA communication

    Get PDF
    Many FPGA-based accelerators are constrained by the available resources and multi-FPGA solutions can be necessary for building more capable systems. Available PCIe solutions provide only FPGA-to-Host communication. In this paper we present JetStream, an open-source1 modular PCIe 3 library, supporting not only fast FPGA-to-Host communication, but also allowing direct FPGA-to-FPGA communication which fully bypasses the memory subsystem. The direct mode saves memory bandwidth for multicast modes and permits to connect multiple FPGAs in various software defined topologies. We show the benefits of JetStream with a large FIR filter spanning four FPGA boards, achieving throughputs of up to 7.09 GB/s per link. Utilizing direct FPGA-to-FPGA transfers reduces the required memory bandwidth by up to 75%

    FPGA-based High Performance Diagnostics For Fusion

    Get PDF
    High performance diagnostics are an important aspect of fusion research. Increasing shot-lengths paired with the requirement for higher accuracy and speed make it mandatory to employ new technology to cope with the increasing demands on digitization and data handling. Field programmable gate arrays (FPGAs) are well known in high performance applications. Their ability to handle multiple fast data streams whilst remaining programmable make them an ideal tool for diagnostic development. Both the improvement of old and the design of new diagnostics can benefit from FPGA-technology and increase the amount of accessible physics significantly. In this work the developments on two FPGA-based diagnostics are presented. In the first part a new open-hardware low-cost FPGA-based digitizer is presented for the MAST-Upgrade (MAST-U) integral electron density interferometer. The system is shown to have an optically limited phase accuracy and a detection bandwidth of over 3.5 MHz. Data is acquired continuously at 20 MS/s and streamed to an acquisition PC via optical fiber. By employing a dual-FPGA approach real-time processing of the density signal can be achieved despite severly limited resources, thus providing a control signal for the MAST-U plasma control system system with less than 8 μs latency. Due to MAST-U being still inoperable, in-situ testing has been conducted on the ASDEX Upgrade, where fast wave physics up to 3.5 MHz could first be observed. The second part presents developments to the Synthetic Aperture Microwave Imaging (SAMI) diagnostic. In addition to improving the utilization of long shot-lengths and enabling dual-polarized acquisition the system has been enhanced to continuously acquire active probing profiles for 2D Doppler back-scattering (DBS), a technique recently developed using SAMI. The aim is to measure pitch angle profiles to derive the edge current density. SAMI has been transferred to the NSTX-Upgrade and integrated into the experiment’s infrastructure, where it has been acquiring data since May 2016. As part of this move an investigation into near-field effects on SAMI’s image reconstruction algorithms was conducted

    Design of a scalable network interface to support enhanced TCP and UDP processing for high speed networks

    Get PDF
    Communication networks have advanced rapidly in providing additional services, with improvements made to their bandwidth and the integration of advanced technology. As the speed of networks exceeds 10 Gbps, the time frame for completing the processing of TCP and UDP packets has become extremely short. The design and implementation of high performance Network Interfaces (NIs) that can support offload protocol functions for current and next-generation networks is challenging. In this thesis two software approaches are presented to enhance protocol processing of TCP and UDP in the network interface. A novel software Large Receive Offload (LRO) approach for enhancing the receiving side has been proposed. The LRO works by aggregating the incoming TCP and UDP packets into larger packets inside the NI’s buffer. The receiving side software has been improved to support out-of-order packets. The second proposed software solution is applied on the Large Send Offload (LSO). The proposed LSO function processing is implemented by segmenting TCP and UDP messages that are larger than the Maximum Transmission Unit to the Maximum Segment Size. New packet headers are generated for each new outgoing packet. A scalable programmable NI based 32-bit RISC core is presented that can support 100 Gbps network speeds. Acceleration of the processing time frame required at the NI has been implemented to prevent hazards (such as Data Hazard and Control Hazard) during the execution of the LRO and the LSO functions. An R2000/3000 RISC has been used in order to test the LRO and LSO functions and to discover the instruction set that is most suitable. Following this the VHDL NI was implemented with three pipeline RISC cores, a simple DMA controller and Content Addressable Memory. An evaluation of the desired RISC clock rate that is required to process TCP and UDP streams at 100 Gbps was conducted. It was determined that a RISC core running at 752 MHz with a DMA clock of 3753 MHz was able to process packets 512 bytes or larger fast enough to support 100 Gbps network speeds

    A Practical Hardware Implementation of Systemic Computation

    Get PDF
    It is widely accepted that natural computation, such as brain computation, is far superior to typical computational approaches addressing tasks such as learning and parallel processing. As conventional silicon-based technologies are about to reach their physical limits, researchers have drawn inspiration from nature to found new computational paradigms. Such a newly-conceived paradigm is Systemic Computation (SC). SC is a bio-inspired model of computation. It incorporates natural characteristics and defines a massively parallel non-von Neumann computer architecture that can model natural systems efficiently. This thesis investigates the viability and utility of a Systemic Computation hardware implementation, since prior software-based approaches have proved inadequate in terms of performance and flexibility. This is achieved by addressing three main research challenges regarding the level of support for the natural properties of SC, the design of its implied architecture and methods to make the implementation practical and efficient. Various hardware-based approaches to Natural Computation are reviewed and their compatibility and suitability, with respect to the SC paradigm, is investigated. FPGAs are identified as the most appropriate implementation platform through critical evaluation and the first prototype Hardware Architecture of Systemic computation (HAoS) is presented. HAoS is a novel custom digital design, which takes advantage of the inbuilt parallelism of an FPGA and the highly efficient matching capability of a Ternary Content Addressable Memory. It provides basic processing capabilities in order to minimize time-demanding data transfers, while the optional use of a CPU provides high-level processing support. It is optimized and extended to a practical hardware platform accompanied by a software framework to provide an efficient SC programming solution. The suggested platform is evaluated using three bio-inspired models and analysis shows that it satisfies the research challenges and provides an effective solution in terms of efficiency versus flexibility trade-off
    corecore