1,330 research outputs found

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    Hardware prototyping and validation of a W-ΔDOR digital signal processor

    Get PDF
    Microwave tracking, usually performed by on ground processing of the signals coming from a spacecraft, represents a crucial aspect in every deep-space mission. Various noise sources, including receiver noise, affect these signals, limiting the accuracy of the radiometric measurements obtained from the radio link. There are several methods used for spacecraft tracking, including the Delta-Differential One-Way Ranging (ΔDOR) technique. In the past years, European Space Agency (ESA) missions relied on a narrowband ΔDOR system for navigation in the cruise phase. To limit the adverse effect of nonlinearities in the receiving chain, an innovative wideband approach to ΔDOR measurements has recently been proposed. This work presents the hardware implementation of a new version of the ESA X/Ka Deep Space Transponder based on the new tracking technique named Wideband ΔDOR (W-ΔDOR). The architecture of the new transponder guarantees backward compatibility with narrowband ΔDOR

    Investigation on Mlp Artificial Neural Network Using FPGA For Autonomous Cart Follower System

    Get PDF
    Dengan kos alat pengesan yang semakin rendah, masa depan sistem pedati pengikut autonomi akan dilengkapi dengan lebih banyak alat pengesan. Ini menjadi cabaran rekabentuk dalam mengendalikan data besar dan kerumitan perkukuhan. Kebanyakan sistem yang sedia ada menggunakan papan mikropengawal yang mempunyai prestasi yang terhad dan pengembangan tidak mungkin tanpa penggantian yang lebih baru. Projek ini mencadangkan perlaksanaan alternatif sistem pedati pengikut autonomi dengan model rangkaian neural MLP menggunakan FPGA. Sistem pedati pengikut autonomi yang mengguakan papan mikropengawal telah diubah suai untuk menggunakan papan FPGA dan dilaksanakan melalui Sistem pada Chip (SOC). System rangkaian neural dilatih dalam simulasi dengan vektor latihan yang dikumpul daripada sistem pedati pengikut autonomi yang sedia ada. System rangkaian neural kemudian dilaksanakan sebagai perkukuhan dalam SOC itu. Dalam pemerhatian, jejak perkukuhan model rangkaian neural kekal saiz kecil tanpa mengira saiz rangkaian neural. Hasil kajian menunjukkan bahawa dengan penggunaan sumber tambahan sebanyak 40%, penambahbaikan sistem secara keseluruhan sebanyak 27 kali dicapai dengan penggunaan blok pecutan perkakasan di SOC, berbanding dengan SOC tanpa penggunaan blok pecutan perkakasan. ________________________________________________________________________________________________________________________ The future of the autonomous cart follower system will equipped with lots of sensory data, due to the ever lower cost of sensory device. This provides design challenge on handling large data and firmware complexity. Most of the existing systems are implemented via usage of microcontroller board, which has limited performance and expansion is not possible without replacement of newer board. The project proposes an alternative approach of running the autonomous cart follower systems on neural network model using Field Programmable Gates Array (FPGA). A microcontroller based autonomous cart follower systems is modified to use the FPGA board and implemented via the System on Chip (SOC) approach. The neural network is trained offline in simulation tools with training vector collected from running the existing autonomous cart follower systems. The trained neural network model then implemented as software code in the SOC. By observation the firmware footprint of the neural network model remains small size regardless of the neural network size. The result shows that with 40% more additional resource utilization, the overall system improvement of 27 times is achieved with the usage of hardware acceleration block in SOC compared to SOC without hardware acceleration

    A review of synthetic-aperture radar image formation algorithms and implementations: a computational perspective

    Get PDF
    Designing synthetic-aperture radar image formation systems can be challenging due to the numerous options of algorithms and devices that can be used. There are many SAR image formation algorithms, such as backprojection, matched-filter, polar format, Range–Doppler and chirp scaling algorithms. Each algorithm presents its own advantages and disadvantages considering efficiency and image quality; thus, we aim to introduce some of the most common SAR image formation algorithms and compare them based on these two aspects. Depending on the requisites of each individual system and implementation, there are many device options to choose from, for in stance, FPGAs, GPUs, CPUs, many-core CPUs, and microcontrollers. We present a review of the state of the art of SAR imaging systems implementations. We also compare such implementations in terms of power consumption, execution time, and image quality for the different algorithms used.info:eu-repo/semantics/publishedVersio

    Exploring FPGA Implementation for Binarized Neural Network Inference

    Get PDF
    Deep convolutional neural network has taken an important role in machine learning algorithm. It is widely used in different areas such as computer vision, robotics, and biology. However, the models of deep neural networks become larger and more computation complexity which is a big obstacle for such huge model to implement on embedded systems. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binarized activation function, can significantly reduce the parameter size and computation cost, which makes it hardware-friendly for Field-Programmable Gate Arrays (FPGAs) implementation with efficient energy cost. This thesis proposes to implement a new parallel convolutional binarized neural network (i.e. PC-BNN) on FPGA with accurate inference. The embedded PC-BNN is designed for image classification on CIFAR-10 dataset and explores the hardware architecture and optimization of customized CNN topology. The parallel-convolution binarized neural network has two parallel binarized convolution layers which replaces the original single binarized convolution layer. It achieves around 86% on CIFAR-10 dataset and owns 2.3Mb parameter size. We implement our PC-BNN inference into the Xilinx PYNQ Z1 FPGA board which only has 4.9Mb on-chip Block RAM. Since the ultra-small network parameter, the whole model parameters can be stored on on-chip memory which can greatly reduce energy consumption and computation latency. Meanwhile, we design a new pipeline streaming architecture for PC-BNN hardware inference which can further increase the performance. The experiment results show that our PC-BNN inference on FPGA achieves 930 frames per second and 387.5 FPS/Watt, which are among the best throughput and energy efficiency compared to most recent works

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures
    corecore