1,002 research outputs found

    H-SIMD machine : configurable parallel computing for data-intensive applications

    Get PDF
    This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations. The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering. In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    FPGA implementation of a 10 GS/s variable-length FFT for OFDM-based optical communication systems

    Full text link
    [EN] The transmission rate in current passive optical networks can be increased by employing Orthogonal Frequency Division Multiplexing (OFDM) modulation. The computational kernel of this modulation is the fast Fourier transform (FFT) operator, which has to achieve a very high throughput in order to be used in optical networks. This paper presents the implementation in an FPGA device of a variable-length FFT that can be configured in run-time to compute different FFT lengths between 16 and 1024 points. The FFT reaches a throughput of 10 GS/s in a Virtex-7 485T-3 FPGA device and was used to implement a 20 Gb/s optical OFDM receiver. (C) 2018 Elsevier B.V. All rights reserved.This work was supported by the Spanish Ministerio de Economia y Competitividad under project TEC2015-70858-C2-2-R with FEDER funds.Bruno, JS.; Almenar Terre, V.; Valls Coquillat, J. (2019). FPGA implementation of a 10 GS/s variable-length FFT for OFDM-based optical communication systems. Microprocessors and Microsystems. 64:195-204. https://doi.org/10.1016/j.micpro.2018.12.002S1952046

    Ein flexibles, heterogenes Bildverarbeitungs-Framework für weltraumbasierte, rekonfigurierbare Datenverarbeitungsmodule

    Get PDF
    Scientific instruments as payload of current space missions are often equipped with high-resolution sensors. Thereby, especially camera-based instruments produce a vast amount of data. To obtain the desired scientific information, this data usually is processed on ground. Due to the high distance of missions within the solar system, the data rate for downlink to the ground station is strictly limited. The volume of scientific relevant data is usually less compared to the obtained raw data. Therefore, processing already has to be carried out on-board the spacecraft. An example of such an instrument is the Polarimetric and Helioseismic Imager (PHI) on-board Solar Orbiter. For acquisition, storage and processing of images, the instrument is equipped with a Data Processing Module (DPM). It makes use of heterogeneous computing based on a dedicated LEON3 processor in combination with two reconfigurable Xilinx Virtex-4 Field-Programmable Gate Arrays (FPGAs). The thesis will provide an overview of the available space-grade processing components (processors and FPGAs) which fulfill the requirements of deepspace missions. It also presents existing processing platforms which are based upon a heterogeneous system combining processors and FPGAs. This also includes the DPM of the PHI instrument, whose architecture will be introduced in detail. As core contribution of this thesis, a framework will be presented which enables high-performance image processing on such hardware-based systems while retaining software-like flexibility. This framework mainly consists of a variety of modules for hardware acceleration which are integrated seamlessly into the data flow of the on-board software. Supplementary, it makes extensive use of the dynamic in-flight reconfigurability of the used Virtex-4 FPGAs. The flexibility of the presented framework is proven by means of multiple examples from within the image processing of the PHI instrument. The framework is analyzed with respect to processing performance as well as power consumption.Wissenschaftliche Instrumente auf aktuellen Raumfahrtmissionen sind oft mit hochauflösenden Sensoren ausgestattet. Insbesondere kamerabasierte Instrumente produzieren dabei eine große Menge an Daten. Diese werden üblicherweise nach dem Empfang auf der Erde weiterverarbeitet, um daraus wissenschaftlich relevante Informationen zu gewinnen. Aufgrund der großen Entfernung von Missionen innerhalb unseres Sonnensystems ist die Datenrate zur Übertragung an die Bodenstation oft sehr begrenzt. Das Volumen der wissenschaftlich relevanten Daten ist meist deutlich kleiner als die aufgenommenen Rohdaten. Daher ist es vorteilhaft, diese bereits an Board der Sonde zu verarbeiten. Ein Beispiel für solch ein Instrument ist der Polarimetric and Helioseismic Imager (PHI) an Bord von Solar Orbiter. Um die Daten aufzunehmen, zu speichern und zu verarbeiten, ist das Instrument mit einem Data Processing Module (DPM) ausgestattet. Dieses nutzt ein heterogenes Rechnersystem aus einem dedizierten LEON3 Prozessor, zusammen mit zwei rekonfigurierbaren Xilinx Virtex-4 Field-Programmable Gate Arrays (FPGAs). Die folgende Arbeit gibt einen Überblick über verfügbare Komponenten zur Datenverarbeitung (Prozessoren und FPGAs), die den Anforderungen von Raumfahrtmissionen gerecht werden, und stellt einige existierende Plattformen vor, die auf einem heterogenen System aus Prozessor und FPGA basieren. Hierzu gehört auch das Data Processing Module des PHI Instrumentes, dessen Architektur im Verlauf dieser Arbeit beschrieben wird. Als Kernelement der Dissertation wird ein Framework vorgestellt, das sowohl eine performante, als auch eine flexible Bilddatenverarbeitung auf einem solchen System ermöglicht. Dieses Framework besteht aus verschiedenen Modulen zur Hardwarebeschleunigung und bindet diese nahtlos in den Datenfluss der On-Board Software ein. Dabei wird außerdem die Möglichkeit genutzt, die eingesetzten Virtex-4 FPGAs dynamisch zur Laufzeit zu rekonfigurieren. Die Flexibilität des vorgestellten Frameworks wird anhand mehrerer Fallbeispiele aus der Bildverarbeitung von PHI dargestellt. Das Framework wird bezüglich der Verarbeitungsgeschwindigkeit und Energieeffizienz analysiert

    An integrated soft- and hard-programmable multithreaded architecture

    Get PDF

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    Reconfiguration of field programmable logic in embedded systems

    Get PDF
    corecore