Search CORE

788 research outputs found

Empowering parallel computing with field programmable gate arrays

Author: D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

After more than 30 years, reconﬁgurable computing has grown from a concept to a mature ﬁeld of science and technology. The cornerstone of this evolution is the ﬁeld programmable gate array, a building block enabling the conﬁguration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reﬁnements

Ghent University Academic Bibliography

Real-time on-board obstacle avoidance for UAVs based on embedded stereo vision

Author: Grinberg Michael
Kollmann Matthias
Monka Sebastian
Ruf Boitumelo
Publication venue: 'Copernicus GmbH'
Publication date: 01/01/2018
Field of study

In order to improve usability and safety, modern unmanned aerial vehicles (UAVs) are equipped with sensors to monitor the environment, such as laser-scanners and cameras. One important aspect in this monitoring process is to detect obstacles in the flight path in order to avoid collisions. Since a large number of consumer UAVs suffer from tight weight and power constraints, our work focuses on obstacle avoidance based on a lightweight stereo camera setup. We use disparity maps, which are computed from the camera images, to locate obstacles and to automatically steer the UAV around them. For disparity map computation we optimize the well-known semi-global matching (SGM) approach for the deployment on an embedded FPGA. The disparity maps are then converted into simpler representations, the so called U-/V-Maps, which are used for obstacle detection. Obstacle avoidance is based on a reactive approach which finds the shortest path around the obstacles as soon as they have a critical distance to the UAV. One of the fundamental goals of our work was the reduction of development costs by closing the gap between application development and hardware optimization. Hence, we aimed at using high-level synthesis (HLS) for porting our algorithms, which are written in C/C++, to the embedded FPGA. We evaluated our implementation of the disparity estimation on the KITTI Stereo 2015 benchmark. The integrity of the overall realtime reactive obstacle avoidance algorithm has been evaluated by using Hardware-in-the-Loop testing in conjunction with two flight simulators.Comment: Accepted in the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Scienc

arXiv.org e-Print Archive

KITopen

Fraunhofer-ePrints

Programmable photonics : an opportunity for an accessible large-volume PIC ecosystem

Author: Bogaerts Wim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

We look at the opportunities presented by the new concepts of generic programmable photonic integrated circuits (PIC) to deploy photonics on a larger scale. Programmable PICs consist of waveguide meshes of tunable couplers and phase shifters that can be reconfigured in software to define diverse functions and arbitrary connectivity between the input and output ports. Off-the-shelf programmable PICs can dramatically shorten the development time and deployment costs of new photonic products, as they bypass the design-fabrication cycle of a custom PIC. These chips, which actually consist of an entire technology stack of photonics, electronics packaging and software, can potentially be manufactured cheaper and in larger volumes than application-specific PICs. We look into the technology requirements of these generic programmable PICs and discuss the economy of scale. Finally, we make a qualitative analysis of the possible application spaces where generic programmable PICs can play an enabling role, especially to companies who do not have an in-depth background in PIC technology

Ghent University Academic Bibliography

Real-time implementation of 3D LiDAR point cloud semantic segmentation in an FPGA

Author: Delgado Pedro Paulo Fontes
Publication venue
Publication date: 05/01/2023
Field of study

Dissertação de mestrado em Informatics EngineeringIn the last few years, the automotive industry has relied heavily on deep learning applications for perception solutions. With data-heavy sensors, such as LiDAR, becoming a standard, the task of developing low-power and real-time applications has become increasingly more challenging. To obtain the maximum computational efficiency, no longer can one focus solely on the software aspect of such applications, while disregarding the underlying hardware. In this thesis, a hardware-software co-design approach is used to implement an inference application leveraging the SqueezeSegV3, a LiDAR-based convolutional neural network, on the Versal ACAP VCK190 FPGA. Automotive requirements carefully drive the development of the proposed solution, with real-time performance and low power consumption being the target metrics. A first experiment validates the suitability of Xilinx’s Vitis-AI tool for the deployment of deep convolutional neural networks on FPGAs. Both the ResNet-18 and SqueezeNet neural networks are deployed to the Zynq UltraScale+ MPSoC ZCU104 and Versal ACAP VCK190 FPGAs. The results show that both networks achieve far more than the real-time requirements while consuming low power. Compared to an NVIDIA RTX 3090 GPU, the performance per watt during both network’s inference is 12x and 47.8x higher and 15.1x and 26.6x higher respectively for the Zynq UltraScale+ MPSoC ZCU104 and the Versal ACAP VCK190 FPGA. These results are obtained with no drop in accuracy in the quantization step. A second experiment builds upon the results of the first by deploying a real-time application containing the SqueezeSegV3 model using the Semantic-KITTI dataset. A framerate of 11 Hz is achieved with a peak power consumption of 78 Watts. The quantization step results in a minimal accuracy and IoU degradation of 0.7 and 1.5 points respectively. A smaller version of the same model is also deployed achieving a framerate of 19 Hz and a peak power consumption of 76 Watts. The application performs semantic segmentation over all the point cloud with a field of view of 360°.Nos últimos anos a indústria automóvel tem cada vez mais aplicado deep learning para solucionar problemas de perceção. Dado que os sensores que produzem grandes quantidades de dados, como o LiDAR, se têm tornado standard, a tarefa de desenvolver aplicações de baixo consumo energético e com capacidades de reagir em tempo real tem-se tornado cada vez mais desafiante. Para obter a máxima eficiência computacional, deixou de ser possível focar-se apenas no software aquando do desenvolvimento de uma aplicação deixando de lado o hardware subjacente. Nesta tese, uma abordagem de desenvolvimento simultâneo de hardware e software é usada para implementar uma aplicação de inferência usando o SqueezeSegV3, uma rede neuronal convolucional profunda, na FPGA Versal ACAP VCK190. São os requisitos automotive que guiam o desenvolvimento da solução proposta, sendo a performance em tempo real e o baixo consumo energético, as métricas alvo principais. Uma primeira experiência valida a aptidão da ferramenta Vitis-AI para a implantação de redes neuronais convolucionais profundas em FPGAs. As redes ResNet-18 e SqueezeNet são ambas implantadas nas FPGAs Zynq UltraScale+ MPSoC ZCU104 e Versal ACAP VCK190. Os resultados mostram que ambas as redes ultrapassam os requisitos de tempo real consumindo pouca energia. Comparado com a GPU NVIDIA RTX 3090, a performance por Watt durante a inferência de ambas as redes é superior em 12x e 47.8x e 15.1x e 26.6x respetivamente na Zynq UltraScale+ MPSoC ZCU104 e na Versal ACAP VCK190. Estes resultados foram obtidos sem qualquer perda de accuracy na etapa de quantização. Uma segunda experiência é feita no seguimento dos resultados da primeira, implantando uma aplicação de inferência em tempo real contendo o modelo SqueezeSegV3 e usando o conjunto de dados Semantic-KITTI. Um framerate de 11 Hz é atingido com um pico de consumo energético de 78 Watts. O processo de quantização resulta numa perda mínima de accuracy e IoU com valores de 0.7 e 1.5 pontos respetivamente. Uma versão mais pequena do mesmo modelo é também implantada, atingindo uma framerate de 19 Hz e um pico de consumo energético de 76 Watts. A aplicação desenvolvida executa segmentação semântica sobre a totalidade das nuvens de pontos LiDAR, com um campo de visão de 360°

Universidade do Minho: RepositoriUM

Recommended from our members

FPGA-based multi-sensor relative navigation in space: Preliminary analysis in the framework of the I3DS H2020 project

Author: Aouf N.
Estebanez Camarena M.
Feetham L.
Scannapieco A.
Publication venue
Publication date: 01/01/2018
Field of study

The Horizon 2020 Integrated 3D Sensors (I3DS) project brings together the following entities throughout Europe: THALES ALENIA SPACE - France / Italy / UK / Spain, SINTEF (Norway), TERMA (Denmark), COSINE (Netherlands), PIAP Space (Poland), HERTZ Systems (Poland), and Cranfield University (UK). I3DS is co-funded under the Horizon 2020 EU research and development program and is part of the Strategic Research Cluster on Space Robotics Technologies. The ambition of I3DS is to produce a standardised modular Inspector Sensor Suite (INSES) for autonomous orbital and planetary applications for future space missions. Orbital applications encompass activities such as on-orbit servicing and repair, space rendezvous and docking, collision avoidance and active debris removal (ADR). Simultaneous localisation and surface mapping (SLAM) for planetary exploration and general navigation in an unknown environment for scientific purposes can be considered in planetary applications. These envisaged space applications can be tackled by exploiting the flexibility, high performance and long product life of FPGAs. Conventional FPGAs are subject to Single Event Upsets (SEU) due to space radiation, causing their failure. Therefore, space-graded FPGAs, such as those developed by Xilinx, are targeted within the I3DS project. Currently, the main use of the FPGA within the development of this robust end-to-end multi-sensor suite is for navigation and data preprocessing. The aim of this paper is to assess the capabilities of FPGAs to carry out complex operations, such as running navigation algorithms for space applications. The motivation for the development of the on-board software architecture is as follows: raw data, acquired from the various sensors – including, among others, a High Resolution camera, a stereo camera and a LiDAR – is pre-processed to ensure the provision of robust and optimised inputs to 3D navigation algorithms. Noise reduction and conversion into suitable formats for the successful application of navigation algorithms are therefore the main aims of the data pre-processing. Some techniques adopted in this phase include outlier rejection and data dimensionality reduction for large point clouds, e.g. from LiDAR, and geometric and radiometric correction of the images from the cameras. The pre-processed data will then feed state-of-the-art relative navigation algorithms. Some of the proposed navigation algorithms include Generalised Iterative Closest Point (GICP) for dense 3D point clouds, relative positioning with fiducial markers, and visual odometry. The system environment for the preliminary operation is a test-bench setup formed by a standard desktop computer and a non-space-graded FGA (Xilinx UltraZed-EG FPGA). The choice of FPGA was based on the similarity of this board to other spacegraded ones also provided by Xilinx. Experimental tests on the algorithms are being performed in the framework of the validation campaign for the I3DS project. Preliminary results indicate that the data pre-processing can be efficiently carried out on the FPGA board

City Research Online

Cranfield CERES

R $^3$ SGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems

Author: Cavallari Tommaso
Golodetz Stuart
Rahnama Oscar
Torr Philip H. S.
Walker Simon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2018
Field of study

Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is hard to achieve with conventional hardware, making the use of embedded devices such as FPGAs attractive for low-power applications. However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA context, the accuracy of SGM has been improved by More Global Matching (MGM), which also helps tackle the streaking artifacts that afflict SGM. In this paper, we propose a novel, resource-efficient method that is inspired by MGM's techniques for improving depth quality, but which can be implemented to run in real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI and Middlebury), we show that in comparison to other real-time capable stereo approaches, we can achieve a state-of-the-art balance between accuracy, power efficiency and speed, making our approach highly desirable for use in real-time systems with limited power.Comment: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4 table

arXiv.org e-Print Archive

Crossref

Adapting the SpaceCube v2.0 Data Processing System for Mission-Unique Application Requirements

Author: Davis Milton
Flatley Thomas
Gill Nat
Hasouneh Munther
Petrick David
Sparacino Pietro
Stone Robert
Thomas Luke
Winternitz Luke
Publication venue
Publication date
Field of study

The SpaceCube (sup TM) v2.0 system is a superior high performance, reconfigurable, hybrid data processing system that can be used in a multitude of applications including those that require a radiation hardened and reliable solution. This paper provides an overview of the design architecture, flexibility, and the advantages of the modular SpaceCube v2.0 high performance data processing system for space applications. The current state of the proven SpaceCube technology is based on nine years of engineering and operations. Five systems have been successfully operated in space starting in 2008 with four more to be delivered for launch vehicle integration in 2015. The SpaceCube v2.0 system is also baselined as the avionics solution for five additional flight projects and is always a top consideration as the core avionics for new instruments or spacecraft control. This paper will highlight how this multipurpose system is currently being used to solve design challenges of three independent applications. The SpaceCube hardware adapts to new system requirements by allowing for application-unique interface cards that are utilized by reconfiguring the underlying programmable elements on the core processor card. We will show how this system is being used to improve on a heritage NASA GPS technology, enable a cutting-edge LiDAR instrument, and serve as a typical command and data handling (C&DH) computer for a space robotics technology demonstration

NASA Technical Reports Server

A Precise High Count-Rate Multi-Channel Coincidence Counting Instrument for Quantum Photonics Applications

Author: Arabul Ekin
Publication venue
Publication date: 29/09/2020
Field of study

Explore Bristol Research