1,484 research outputs found
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has
received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking
received support from the European Union’s Horizon 2020 research and innovation programme and Germany,
Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy,
Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL
Joint Undertaking under grant agreement No. 692455-2
Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices
For many applications in low-power real-time robotics, stereo cameras are the
sensors of choice for depth perception as they are typically cheaper and more
versatile than their active counterparts. Their biggest drawback, however, is
that they do not directly sense depth maps; instead, these must be estimated
through data-intensive processes. Therefore, appropriate algorithm selection
plays an important role in achieving the desired performance characteristics.
Motivated by applications in space and mobile robotics, we implement and
evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering
one of the best trade-offs between efficiency and accuracy, ELAS has only been
shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all
intriguing properties of the original algorithm, such as the slanted plane
priors, but can achieve a frame rate of 47fps whilst consuming under 4W of
power. Unlike previous FPGA based designs, we take advantage of both components
on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate
more complex and computationally diverse algorithms for such low power,
real-time systems.Comment: 8 pages, 7 figures, 2 table
Real-time digital signal processing for new wavelength-to-the-user optical access networks
Nowadays, optical access networks provide high capacity to end users with growing availability of multimedia contents that can be streamed to fixed or mobile devices. In this regard, one of the most flexible and low-cost approaches is Passive Optical Network (PON) that is used in Fiber-to-the-Home (FTTH). Due to the growing of the bandwidth demands, Wavelength Division Multiplexing (WDM), and later on ultra-dense WDM (udWDM) PON, with a narrow channel spacing, to increase the number of users through a single fiber, has been deployed.
The udWDM-PON with coherent technology is an attractive solution for the next generation optical access networks with advanced digital signal processing (DSP). Thanks to the higher sensitivity and improved channel selectivity in coherent detection with efficient DSP, optical networks support larger number of users in longer distances.
Since the cost is the main concern in the optical access networks, this thesis presents DSP architectures in coherent receiver (Rx), based on low-cost direct phase modulated commercial DFB lasers. The proposals are completely in agreement with consept of wavelength-to-the-user, where each client in optical network is dedicated to an individual wavelength.
Next, in a 6.25 GHz spaced udWDM grid with the optimized DSP techniques and phase-shift-keying (PSK) modulation format, the high sensitivity is achieved in real-time field-programmable-gate-array (FPGA) implementations.
Moreover, this thesis reduces hardware complexity of optical carrier recovery (CR) with two various strategies. First, based on differential mth-power frequency estimator (FE) by using look-up-tables (LUTs) and second, LUT-free CR architecture, with optimizing the power consumption and hardware resources, as well as improving the channel selectivity in terms of speed and robustness.
Furthermore, by designing very simple but efficient clock recovery, a symbol-rate DSP architecture, which process data using only one sample per symbol (1-sps), for polarization diversity (POD) structure, becomes possible. It makes the DSP independent from state-of-polarization (SOP), even in the case of low-cost optical front-end and low-speed analog-to-digital converters (ADCs), keeps the performance high as well as sensitivity in real-time implementations on FPGA.Avui en dia, les xarxes d'accés òptic proporcionen una alta capacitat als usuaris finals amb una creixent disponibilitat de continguts multimèdia que es poden transmetre a dispositius fixos o mòbils. En aquest sentit, un dels enfocaments més flexibles i de baix cost és la Xarxa Òptica Passiva (PON) que s'utilitza a Fibra-fins-la-Llar (FTTH). A causa del creixent requeriment de l'ample de banda, s'ha desplegat la multiplexació de divisió d'ona (WDM) i, posteriorment, el PON amb WDM d'alta densitat (udWDM), amb un espaiat estret de canals, per augmentar el nombre d'usuaris a través d'una sola fibra. L'udWDM-PON amb tecnologia coherent és una solució atractiva per a les xarxes d'accés òptic d'última generació amb processament avançat de senyal digital (DSP). Gràcies a la major sensibilitat i a la selectivitat millorada del canal en la detecció coherent amb DSP eficient, les xarxes òptiques suporten un nombre més gran d'usuaris a distàncies més llargues. Atès que el cost és la principal preocupació en les xarxes d'accés òptic, aquesta tesi presenta arquitectures DSP en receptor coherent (Rx), basades en làsers DFB comercials modulats en fase directa de baix cost. Les propostes estan d'acord amb la asignació de la longitud d'ona a l'usuari, on a cada client de la xarxa òptica se li dedica a una longitud d'ona individual. A continuació, en una graella udWDM espaciada de 6,25 GHz amb les tècniques de DSP optimitzades i el format de modulació de fase (PSK), s'aconsegueix l'alta sensibilitat en implementacions field-programable-gate-array (FPGA) en temps real. A més, aquesta tesi redueix la complexitat del maquinari de recuperació òptica de portadors (CR) amb dues estratègies diverses. Primer, basat en un estimador de freqüència de potència diferencial (FE) mitjançant l'ús de taules de cerca (LUTs) i, en segon lloc, l'arquitectura CR sense LUT, amb l'optimització del consum d'energia i els recursos de maquinari, a més de millorar la selectivitat del canal en termes de velocitat i robustesa. A més, al dissenyar una recuperació de rellotge molt simple, però eficaç, es fa possible una arquitectura DSP a la velocitat dels símbols, que processa dades utilitzant només una mostra per símbol (1-sps) per a l'estructura de la diversitat de polarització òptica (POD). Fa que el DSP sigui independent de l'estat de polarització (SOP), fins i tot en el cas dels analog-to-digital converters (ADC) de front-end òptics de baix cost, i manté el rendiment alt i la sensibilitat en les implementacions en temps real de FPGA
Real-time digital signal processing for new wavelength-to-the-user optical access networks
Nowadays, optical access networks provide high capacity to end users with growing availability of multimedia contents that can be streamed to fixed or mobile devices. In this regard, one of the most flexible and low-cost approaches is Passive Optical Network (PON) that is used in Fiber-to-the-Home (FTTH). Due to the growing of the bandwidth demands, Wavelength Division Multiplexing (WDM), and later on ultra-dense WDM (udWDM) PON, with a narrow channel spacing, to increase the number of users through a single fiber, has been deployed.
The udWDM-PON with coherent technology is an attractive solution for the next generation optical access networks with advanced digital signal processing (DSP). Thanks to the higher sensitivity and improved channel selectivity in coherent detection with efficient DSP, optical networks support larger number of users in longer distances.
Since the cost is the main concern in the optical access networks, this thesis presents DSP architectures in coherent receiver (Rx), based on low-cost direct phase modulated commercial DFB lasers. The proposals are completely in agreement with consept of wavelength-to-the-user, where each client in optical network is dedicated to an individual wavelength.
Next, in a 6.25 GHz spaced udWDM grid with the optimized DSP techniques and phase-shift-keying (PSK) modulation format, the high sensitivity is achieved in real-time field-programmable-gate-array (FPGA) implementations.
Moreover, this thesis reduces hardware complexity of optical carrier recovery (CR) with two various strategies. First, based on differential mth-power frequency estimator (FE) by using look-up-tables (LUTs) and second, LUT-free CR architecture, with optimizing the power consumption and hardware resources, as well as improving the channel selectivity in terms of speed and robustness.
Furthermore, by designing very simple but efficient clock recovery, a symbol-rate DSP architecture, which process data using only one sample per symbol (1-sps), for polarization diversity (POD) structure, becomes possible. It makes the DSP independent from state-of-polarization (SOP), even in the case of low-cost optical front-end and low-speed analog-to-digital converters (ADCs), keeps the performance high as well as sensitivity in real-time implementations on FPGA.Avui en dia, les xarxes d'accés òptic proporcionen una alta capacitat als usuaris finals amb una creixent disponibilitat de continguts multimèdia que es poden transmetre a dispositius fixos o mòbils. En aquest sentit, un dels enfocaments més flexibles i de baix cost és la Xarxa Òptica Passiva (PON) que s'utilitza a Fibra-fins-la-Llar (FTTH). A causa del creixent requeriment de l'ample de banda, s'ha desplegat la multiplexació de divisió d'ona (WDM) i, posteriorment, el PON amb WDM d'alta densitat (udWDM), amb un espaiat estret de canals, per augmentar el nombre d'usuaris a través d'una sola fibra. L'udWDM-PON amb tecnologia coherent és una solució atractiva per a les xarxes d'accés òptic d'última generació amb processament avançat de senyal digital (DSP). Gràcies a la major sensibilitat i a la selectivitat millorada del canal en la detecció coherent amb DSP eficient, les xarxes òptiques suporten un nombre més gran d'usuaris a distàncies més llargues. Atès que el cost és la principal preocupació en les xarxes d'accés òptic, aquesta tesi presenta arquitectures DSP en receptor coherent (Rx), basades en làsers DFB comercials modulats en fase directa de baix cost. Les propostes estan d'acord amb la asignació de la longitud d'ona a l'usuari, on a cada client de la xarxa òptica se li dedica a una longitud d'ona individual. A continuació, en una graella udWDM espaciada de 6,25 GHz amb les tècniques de DSP optimitzades i el format de modulació de fase (PSK), s'aconsegueix l'alta sensibilitat en implementacions field-programable-gate-array (FPGA) en temps real. A més, aquesta tesi redueix la complexitat del maquinari de recuperació òptica de portadors (CR) amb dues estratègies diverses. Primer, basat en un estimador de freqüència de potència diferencial (FE) mitjançant l'ús de taules de cerca (LUTs) i, en segon lloc, l'arquitectura CR sense LUT, amb l'optimització del consum d'energia i els recursos de maquinari, a més de millorar la selectivitat del canal en termes de velocitat i robustesa. A més, al dissenyar una recuperació de rellotge molt simple, però eficaç, es fa possible una arquitectura DSP a la velocitat dels símbols, que processa dades utilitzant només una mostra per símbol (1-sps) per a l'estructura de la diversitat de polarització òptica (POD). Fa que el DSP sigui independent de l'estat de polarització (SOP), fins i tot en el cas dels analog-to-digital converters (ADC) de front-end òptics de baix cost, i manté el rendiment alt i la sensibilitat en les implementacions en temps real de FPGA.Postprint (published version
Power Estimation Technique for DSP Architectures.
The main goal of power estimation is to optimize the power consumption of a electronic design. Power is a strongly pattern dependent function. Input statistics greatly influence on average power. We solve the pattern dependence problem for intellectual property (IP) designs. In this paper, we present a power macro-modeling technique for digital signal processing (DSP) architectures in terms of the statistical knowledge of their primary inputs. During the power estimation procedure, the sequence of an input stream is generated by a genetic algorithm using input metrics. Then, a Monte Carlo zero delay simulation is performed and a power dissipation macro-model function is built from power dissipation results. From then on, this macro-model function can be used to estimate power dissipation of the system just by using the statistics of the macro-block’s primary in puts. In experiments with the DSP system, the average error is 26%
REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS
New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
Multi look-up table FPGA implementation of an adaptive digital predistorter for linearizing RF power amplifiers with memory effects
This paper presents a hardware implementation of
a digital predistorter (DPD) for linearizing RF power amplifiers
(PAs) for wideband applications. The proposed predistortion linearizer
is based on a nonlinear auto-regressive moving average
(NARMA) structure, which can be derived from the NARMA PA
behavioral model and then mapped into a set of scalable lookup
tables (LUTs). The linearizer takes advantage of its recursive nature
to relax the LUT count needed to compensate memory effects
in PAs. Experimental support is provided by the implementation
of the proposed NARMA DPD in a field-programmable gate-array
device to linearize a 170-W peak power PA, validating the recursive
DPD NARMA structure for W-CDMA signals and flexible transmission
bandwidth scenarios. To the best of the authors’ knowledge,
it is the first time that a recursive structure is experimentally
validated for DPD purposes. In addition to the results on PA efficiency
and linearity, this paper addresses many practical implementation
issues related to the use of FPGA in DPD applications,
giving an original insight on actual prototyping scenarios. Finally,
this study discusses the possibility of further enhancing the overall
efficiency by degrading the PA operation mode, provided that DPD
may be unavoidable due to the impact of memory effects.Peer Reviewe
- …