220 research outputs found

    Hardware-accelerated data decoding and reconstruction for automotive LiDAR sensors

    Get PDF
    The automotive industry is facing an unprecedented technological transformation towards fully autonomous vehicles. Optimists predict that, by 2030, cars will be sufficiently reliable, affordable, and common to displace most current human driving tasks. To cope with these trends, autonomous vehicles require reliable perception systems to hear and see all the surroundings, being light detection and ranging (LiDAR) sensors a key instrument for recreating a 3D visualization of the world. However, for a reliable operation, such systems require LiDAR sensors to provide high-resolution 3D representations of the car’s vicinity, which results in millions of data points to be processed in real-time. With this article we propose the ALFA-Pi, a data packet decoder and reconstruction system fully deployed on an embedded reconfigurable hardware platform. By resorting to field-programmable gate array (FPGA) technology, ALFAPi is able to interface different LiDAR sensors at the same time, while providing custom representation outputs to high-level perception systems. By accelerating the LiDAR interface, the proposed system outperforms current software-only approaches, achieving lower latency in the data acquisition and data decoding tasks while reaching high performance ratios

    DPTC -- an FPGA-based trace compression

    Get PDF
    Recording of flash-ADC traces is challenging from both the transmission bandwidth and storage cost perspectives. This paper presents a configuration-free lossless compression algorithm which addresses both limitations, by compressing the data on-the-fly in the controlling field-programmable gate array (FPGA). Thus the difference predicted trace compression (DPTC) can easily be used directly in front-end electronics. The method first computes the differences between consecutive samples in the traces, thereby concentrating the most probable values around zero. The values are then stored as groups of four, with only the necessary least-significant bits in a variable-length code, packed in a stream of 32-bit words. To evaluate the efficiency, the storage cost of compressed traces is modeled as a baseline cost including the ADC noise, and a cost for pulses that depends on their amplitude and width. The free parameters and the validity of the model are determined by comparing it with the results of compressing a large set of artificial traces with varying characteristics. The compression method was also applied to actual data from different types of detectors, thereby demonstrating its general applicability. The compression efficiency is found to be comparable to popular general-purpose compression methods, while available for FPGA implementation using limited resources. A typical storage cost is around 4 to 5 bits per sample. Code for the FPGA implementation in VHDL and for the CPU decompression routine in C of DPTC are available as open source software, both operating at multi-100 Msamples/s speeds.Comment: 9 pages, 7 figure

    ReS2tAC -- UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices

    Full text link
    With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV

    Embedded System Optimization of Radar Post-processing in an ARM CPU Core

    Get PDF
    Algorithms executed on the radar processor system contributes to a significant performance bottleneck of the overall radar system. One key performance concern is the latency in target detection when dealing with hard deadline systems. Research has shown software optimization as one major contributor to radar system performance improvements. This thesis aims at software optimizations using a manual and automatic approach and analyzing the results to make informed future decisions while working with an ARM processor system. In order to ascertain an optimized implementation, a question put forward was whether the algorithms on the ARM processor could work with a 6-antenna implementation without a decline in the performance. However, an answer would also help project how many additional algorithms can still be added without performance decline. The manual optimization was done based on the quantitative analysis of the software execution time. The manual optimization approach looked at the vectorization strategy using the NEON vector register on the ARM CPU to reimplement the initial Constant False Alarm Rate(CFAR) Detection algorithm. An additional optimization approach was eliminating redundant loops while going through the Range Gates and Doppler filters. In order to determine the best compiler for automatic code optimization for the radar algorithms on the ARM processor, the GCC and Clang compilers were used to compile the initial algorithms and the optimized implementation on the radar post-processing stage. Analysis of the optimization results showed that it is possible to run the radar post-processing algorithms on the ARM processor at the 6-antenna implementation without system load stress. In addition, the results show an excellent headroom margin based on the defined scenario. The result analysis further revealed that the effect of dynamic memory allocation could not be underrated in situations where performance is a significant concern. Additional statements from the result demonstrated that the GCC and Clang compiler has their strength and weaknesses when used in the compilation. One limiting factor to note on the optimization using the NEON register is the sample size’s effect on the optimization implementation. Although it fits into the test samples used based on the defined scenario, there might be varying results in varying window cell size situations that might not necessarily improve the time constraints
    • …
    corecore