155 research outputs found

    Real-time implementation of 3D LiDAR point cloud semantic segmentation in an FPGA

    Get PDF
    Dissertação de mestrado em Informatics EngineeringIn the last few years, the automotive industry has relied heavily on deep learning applications for perception solutions. With data-heavy sensors, such as LiDAR, becoming a standard, the task of developing low-power and real-time applications has become increasingly more challenging. To obtain the maximum computational efficiency, no longer can one focus solely on the software aspect of such applications, while disregarding the underlying hardware. In this thesis, a hardware-software co-design approach is used to implement an inference application leveraging the SqueezeSegV3, a LiDAR-based convolutional neural network, on the Versal ACAP VCK190 FPGA. Automotive requirements carefully drive the development of the proposed solution, with real-time performance and low power consumption being the target metrics. A first experiment validates the suitability of Xilinx’s Vitis-AI tool for the deployment of deep convolutional neural networks on FPGAs. Both the ResNet-18 and SqueezeNet neural networks are deployed to the Zynq UltraScale+ MPSoC ZCU104 and Versal ACAP VCK190 FPGAs. The results show that both networks achieve far more than the real-time requirements while consuming low power. Compared to an NVIDIA RTX 3090 GPU, the performance per watt during both network’s inference is 12x and 47.8x higher and 15.1x and 26.6x higher respectively for the Zynq UltraScale+ MPSoC ZCU104 and the Versal ACAP VCK190 FPGA. These results are obtained with no drop in accuracy in the quantization step. A second experiment builds upon the results of the first by deploying a real-time application containing the SqueezeSegV3 model using the Semantic-KITTI dataset. A framerate of 11 Hz is achieved with a peak power consumption of 78 Watts. The quantization step results in a minimal accuracy and IoU degradation of 0.7 and 1.5 points respectively. A smaller version of the same model is also deployed achieving a framerate of 19 Hz and a peak power consumption of 76 Watts. The application performs semantic segmentation over all the point cloud with a field of view of 360°.Nos últimos anos a indústria automóvel tem cada vez mais aplicado deep learning para solucionar problemas de perceção. Dado que os sensores que produzem grandes quantidades de dados, como o LiDAR, se têm tornado standard, a tarefa de desenvolver aplicações de baixo consumo energético e com capacidades de reagir em tempo real tem-se tornado cada vez mais desafiante. Para obter a máxima eficiência computacional, deixou de ser possível focar-se apenas no software aquando do desenvolvimento de uma aplicação deixando de lado o hardware subjacente. Nesta tese, uma abordagem de desenvolvimento simultâneo de hardware e software é usada para implementar uma aplicação de inferência usando o SqueezeSegV3, uma rede neuronal convolucional profunda, na FPGA Versal ACAP VCK190. São os requisitos automotive que guiam o desenvolvimento da solução proposta, sendo a performance em tempo real e o baixo consumo energético, as métricas alvo principais. Uma primeira experiência valida a aptidão da ferramenta Vitis-AI para a implantação de redes neuronais convolucionais profundas em FPGAs. As redes ResNet-18 e SqueezeNet são ambas implantadas nas FPGAs Zynq UltraScale+ MPSoC ZCU104 e Versal ACAP VCK190. Os resultados mostram que ambas as redes ultrapassam os requisitos de tempo real consumindo pouca energia. Comparado com a GPU NVIDIA RTX 3090, a performance por Watt durante a inferência de ambas as redes é superior em 12x e 47.8x e 15.1x e 26.6x respetivamente na Zynq UltraScale+ MPSoC ZCU104 e na Versal ACAP VCK190. Estes resultados foram obtidos sem qualquer perda de accuracy na etapa de quantização. Uma segunda experiência é feita no seguimento dos resultados da primeira, implantando uma aplicação de inferência em tempo real contendo o modelo SqueezeSegV3 e usando o conjunto de dados Semantic-KITTI. Um framerate de 11 Hz é atingido com um pico de consumo energético de 78 Watts. O processo de quantização resulta numa perda mínima de accuracy e IoU com valores de 0.7 e 1.5 pontos respetivamente. Uma versão mais pequena do mesmo modelo é também implantada, atingindo uma framerate de 19 Hz e um pico de consumo energético de 76 Watts. A aplicação desenvolvida executa segmentação semântica sobre a totalidade das nuvens de pontos LiDAR, com um campo de visão de 360°

    AI and ML Accelerator Survey and Trends

    Full text link
    This paper updates the survey of AI accelerators and processors from past three years. This paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and power consumption numbers. The performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discussed and analyzed. Two new trends plots based on accelerator release dates are included in this year's paper, along with the additional trends of some neuromorphic, photonic, and memristor-based inference accelerators.Comment: 10 pages, 4 figures, 2022 IEEE High Performance Extreme Computing (HPEC) Conference. arXiv admin note: substantial text overlap with arXiv:2009.00993, arXiv:2109.0895

    Multi-modal Experts Network for Autonomous Driving

    Full text link
    End-to-end learning from sensory data has shown promising results in autonomous driving. While employing many sensors enhances world perception and should lead to more robust and reliable behavior of autonomous vehicles, it is challenging to train and deploy such network and at least two problems are encountered in the considered setting. The first one is the increase of computational complexity with the number of sensing devices. The other is the phenomena of network overfitting to the simplest and most informative input. We address both challenges with a novel, carefully tailored multi-modal experts network architecture and propose a multi-stage training procedure. The network contains a gating mechanism, which selects the most relevant input at each inference time step using a mixed discrete-continuous policy. We demonstrate the plausibility of the proposed approach on our 1/6 scale truck equipped with three cameras and one LiDAR.Comment: Published at the International Conference on Robotics and Automation (ICRA), 202

    NASA SpaceCube Next-Generation Artificial-Intelligence Computing for STP-H9-SCENIC on ISS

    Get PDF
    Recently, Artificial Intelligence (AI) and Machine Learning (ML) capabilities have seen an exponential increase in interest from academia and industry that can be a disruptive, transformative development for future missions. Specifically, AI/ML concepts for edge computing can be integrated into future missions for autonomous operation, constellation missions, and onboard data analysis. However, using commercial AI software frameworks onboard spacecraft is challenging because traditional radiation-hardened processors and common spacecraft processors cannot provide the necessary onboard processing capability to effectively deploy complex AI models. Advantageously, embedded AI microchips being developed for the mobile market demonstrate remarkable capability and follow similar size, weight, and power constraints that could be imposed on a space-based system. Unfortunately, many of these devices have not been qualified for use in space. Therefore, Space Test Program - Houston 9 - SpaceCube Edge-Node Intelligent Collaboration (STP-H9-SCENIC) will demonstrate inflight, cutting-edge AI applications on multiple space-based devices for next-generation onboard intelligence. SCENIC will characterize several embedded AI devices in a relevant space environment and will provide NASA and DoD with flight heritage data and lessons learned for developers seeking to enable AI/ML on future missions. Finally, SCENIC also includes new CubeSat form-factor GPS and SDR cards for guidance and navigation

    Artificial Intelligence Advancements for Digitising Industry

    Get PDF
    In the digital transformation era, when flexibility and know-how in manufacturing complex products become a critical competitive advantage, artificial intelligence (AI) is one of the technologies driving the digital transformation of industry and industrial products. These products with high complexity based on multi-dimensional requirements need flexible and adaptive manufacturing lines and novel components, e.g., dedicated CPUs, GPUs, FPGAs, TPUs and neuromorphic architectures that support AI operations at the edge with reliable sensors and specialised AI capabilities. The change towards AI-driven applications in industrial sectors enables new innovative industrial and manufacturing models. New process management approaches appear and become part of the core competence in the organizations and the network of manufacturing sites. In this context, bringing AI from the cloud to the edge and promoting the silicon-born AI components by advancing Moore’s law and accelerating edge processing adoption in different industries through reference implementations becomes a priority for digitising industry. This article gives an overview of the ECSEL AI4DI project that aims to apply at the edge AI-based technologies, methods, algorithms, and integration with Industrial Internet of Things (IIoT) and robotics to enhance industrial processes based on repetitive tasks, focusing on replacing process identification and validation methods with intelligent technologies across automotive, semiconductor, machinery, food and beverage, and transportation industries.publishedVersio

    A Constraint-based Mission Planning Approach for Reconfigurable Multi-Robot Systems

    Get PDF
    The application of reconfigurable multi-robot systems introduces additional degrees of freedom to design robotic missions compared to classical multi-robot systems. To allow for autonomous operation of such systems, planning approaches have to be investigated that cannot only cope with the combinatorial challenge arising from the increased flexibility of modular systems, but also exploit this flexibility to improve for example the safety of operation. While the problem originates from the domain of robotics it is of general nature and significantly intersects with operations research. This paper suggests a constraint-based mission planning approach, and presents a set of revised definitions for reconfigurable multi-robot systems including the representation of the planning problem using spatially and temporally qualified resource constraints. Planning is performed using a multi-stage approach and a combined use of knowledge-based reasoning, constraint-based programming and integer linear programming. The paper concludes with the illustration of the solution of a planned example mission
    corecore