5 research outputs found

    Sliding window support for image processing in autonomous vehicles

    Get PDF
    Camera-based autonomous driving extensively ma-nipulates images for object detection, object tracking, or camera-based localization tasks. Therefore, efficient and fast image processing is crucial in those systems. Unfortunately, current solutions either do not meet AD’s constraints for real-time performance and energy efficiency or are domain-specific and, thus, not general [14]. In this work, we introduce Sliding Window Processing (SWP), a SIMD execution model that natively operates on sliding windows of image pixels. We illustrate the benefits of SWP through a novel ISA extension called SLIDEX that achieves high performance and energy efficiency while maintaining pro-grammability. We demonstrate the benefits of SLIDEX for the image processing tasks of ORB-SLAM [17] [18], a state-of-the-art camera-based localization system. SLIDEX achieves an average end-to-end speedup of ~1.65× and ~1.2× compared to equivalent scalar and vector baselines respectively. Compared with the vector implementation, our solution reduces the end-to-end energy consumption a 22% on average.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and the FPU grant FPU18/04413.Peer ReviewedPostprint (published version

    LOCATOR: Low-power ORB accelerator for autonomous cars

    Get PDF
    Simultaneous Localization And Mapping (SLAM) is crucial for autonomous navigation. ORB-SLAM is a state-of-the-art Visual SLAM system based on cameras used for self-driving cars. In this paper, we propose a high-performance, energy-efficient, and functionally accurate hardware accelerator for ORB-SLAM, focusing on its most time-consuming stage: Oriented FAST and Rotated BRIEF (ORB) feature extraction. The Rotated BRIEF (rBRIEF) descriptor generation is the main bottleneck in ORB computation, as it exhibits highly irregular access patterns to local on-chip memories causing a high-performance penalty due to bank conflicts. We introduce a technique to find an optimal static pattern to perform parallel accesses to banks based on a genetic algorithm. Furthermore, we propose the combination of an rBRIEF pixel duplication cache, selective ports replication, and pipelining to reduce latency without compromising cost. The accelerator achieves a reduction in energy consumption of 14597× and 9609×, with respect to high-end CPU and GPU platforms, respectively.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020- 113172RB-I00, the ICREA Academia program and the FPU grant FPU18/04413Peer ReviewedPostprint (published version

    High-Throughput and Area-Optimized Architecture for rBRIEF Feature Extraction

    No full text
    Feature matching is a fundamental step in many real-time computer vision applications such as simultaneous localization and mapping, motion analysis, and stereo correspondence. The performance of these applications depends on the distinctiveness of the visual feature descriptors used, and the speed at which they can be extracted from video frames. When combined with standard key-point detectors, the rotation-aware binary robust independent elementary features (rBRIEF) descriptor has been shown to outperform its counterparts. In this paper, we present a deep-pipelined stream processing architecture that is capable of extracting rBRIEF features from high-throughput video frames. To achieve high processing rate and low complexity hardware, the proposed architecture incorporates an enhanced moving summation strategy to calculate the key-points' patch moments and employs approximate computations to achieve patch rotation. Multiplier-less circuitry is introduced throughout the architecture to avoid the use of costly multipliers. Implementation on the Altera Aria V device demonstrates that the proposed architecture leads to 53.3% reduction in hardware resources (adaptive logic modules), while achieving 50% higher accuracy (in terms of average Hamming distance) when compared to the state-of-the-art architecture. In addition, the proposed architecture is able to process high-resolution (1920 Ă— 1080) images at 60 fps, while consuming only 456.15 mW power.National Research Foundation (NRF)Accepted versionThis research project is partially funded by the NationalResearch Foundation Singapore under its Campus for Re-search Excellence and Technological Enterprise (CREATE)programme

    High-Throughput and Area-Optimized Architecture for rBRIEF Feature Extraction

    No full text
    corecore