12 research outputs found

    A Vocabulary Forest-based object matching processor with 2.07M-vec/s throughput and 13.3nJ/vector energy in full-HD resolution

    No full text
    A Vocabulary Forest-based object matching processor is proposed to speed up the feature matching stage for the object recognition system with high accuracy. Adopting Reusable-Vocabulary Tree architecture and hardware sharing technique reduces area, as well as adopting propagate-and-compute-array architecture in the combiner and external database elimination enhances the matching speed more than 16x compared to Approximate Nearest Neighbor searching processors. The proposed Vocabulary Forest processor, implemented in 65nm CMOS process, achieves 2.07M-vec/s throughput and 13.3nJ/vector energy efficiency, and successfully matches 100 objects with 95.7% matching accuracy

    A Vocabulary Forest Object Matching Processor With 2.07 M-Vector/s Throughput and 13.3 nJ/Vector Per-Vector Energy for Full-HD 60 fps Video Object Recognition

    No full text
    Approximate nearest neighbor searching has been studied as the keypoint matching algorithm for object recognition systems, and its hardware realization has reduced the external memory access which is the main bottleneck in object recognition process. However, external memory access reduction alone cannot satisfy the ever-increasing memory bandwidth requirement due to the rapid increase of the image resolution and frame rate of many recent applications such as advanced driver assistance system. In this paper, vocabulary forest (VF) processor is proposed that achieves both high accuracy and high speed by integrating on-chip database (DB) to remove external memory access. The area-efficient reusable-vocabulary tree architecture is proposed to reduce area, and the propagate-and-compute-array architecture is proposed to enhance the processing speed of the VF. The proposed VF processor can speed up the object matching stage by 16.4x compared with the state-of-the-art matching processor [Hong et al., Symp. VLSIC, 2013] for high resolution (Full-HD) and real-time (60 fps) video object recognition. It is fabricated using 65 nm CMOS technology and integrated into an object recognition SoC. The proposed VF chip achieves 2.07 M-vector/s throughput and 13.3 nJ/vector per-vector energy with 95.7% matching accuracy for 100 objects

    A multi-modal and tunable Radial-Basis-Funtion circuit with supply and temperature compensation

    No full text
    We propose an analog Radial-Basis-Function (RBF) circuit that generates 4 different types of RBFs, which are spline, Gaussian, multi-quadratic, and log-like spline curves. Moreover, the proposed RBF circuit is designed to have high tunability on centers, heights, and widths. The proposed RBF circuit is also robust to both temperature variation (-37~87???C) and supply voltage variation (1~2V). The sum of area and power consumption of each RBF from 3 different previous works is 13, 622??m 2 and 121??W, respectively. On the other hand, the proposed circuit occupies only 1,050??m 2 and consumes 10.5??W which are only 13% and 11.5%, respectively. For its verification, an analog/digital mixed-mode RBF Neural Network (RBFNN) classifier is designed which adopted the proposed RBF circuit

    A 1.22 TOPS and 1.52 mW/MHz Augmented Reality Multicore Processor With Neural Network NoC for HMD Applications

    No full text
    Real-time augmented reality (AR) is actively studied for the future user interface and experience in high-performance head-mounted display (HMD) systems. The small battery size and limited computing power of the current HMD, however, fail to implement the real-time markerless AR in the HMD. In this paper, we propose a real-time and low-power AR processor for advanced 3D-AR HMD applications. For the high throughput, the processor adopts task-level pipelined SIMD-PE clusters and a congestion-aware network-on-chip (NoC). Both of these two features exploit the high data-level parallelism (DLP) and task-level parallelism (TLP) with the pipelined multicore architecture. For the low power consumption, it employs a vocabulary forest accelerator and a mixed-mode support vector machine (SVM)-based DVFS control to reduce unnecessary external memory accesses and core activation. The proposed 4 mm 8 mm HMD AR processor is fabricated using 65 nm CMOS technology for a battery-powered HMD platform with real-time AR operation. It consumes 381 mW average power and 778 mW peak power at 250 MHz operating frequency and 1.2 V supply voltage. It achieves 1.22 TOPS peak performance and 1.57 TOPS/W energy efficiency, which are, respectively, and higher than the state of the art

    A multi-granularity parallelism object recognition processor with content-aware fine-grained task scheduling

    No full text
    Multiple granularity parallel core architecture is proposed to accelerate object recognition with low area and energy consumption. By adopting task-level optimized cores with different parallelism and complexity, the proposed processor achieves real-time object recognition with 271.4 GOPS peak performance. In addition, content-aware fine-grained task scheduling is proposed to enable low power real-time object recognition on 30fps 720p HD video streams. As a result, the object recognition processor achieves 9.4nJ/pixel energy efficiency and 25.8 GOPS/W??mm 2 power-area efficiency in O.13um CMOS technology

    An Augmented Reality Processor with a Congestion-Aware Network-onChip Scheduler

    No full text
    For a markerless augmented reality system that can operate all day, the authors implemented a low-power Basic On-Chip Network-Augmented Reality (BONE-AR) processor to execute object recognition, camera pose estimation, and 3D graphics rendering in real time for an HD resolution video input. BONE-AR employs six clusters of heterogeneous SIMD processors distributed on the mesh topology network on a chip (NoC) to exploit data- and task-level parallelism. A visual attention algorithm reduces overall workload by removing background clutters from the input video frames, but also incurs NoC congestion because of a dynamically fluctuating workload. The authors propose a congestion-aware scheduler that detects and resolves the NoC congestion to prevent throughput degradation of a task-level pipeline

    A 646GOPS/W multi-classifier many-core processor with cortex-like architecture for super-resolution recognition

    No full text
    Object recognition processors have been reported for the applications of autonomic vehicle navigation, smart surveillance and unmanned air vehicles (UAVs) [1-3]. Most of the processors adopt a single classifier rather than multiple classifiers even though multi-classifier systems (MCSs) offer more accurate recognition with higher robustness [4]. In addition, MCSs can incorporate the human vision system (HVS) recognition architecture to reduce computational requirements and enhance recognition accuracy. For example, HMAX models the exact hierarchical architecture of the HVS for improved recognition accuracy [5]. Compared with SIFT, known to have the best recognition accuracy based on local features extracted from the object [6], HMAX can recognize an object based on global features by template matching and a maximum-pooling operation without feature segmentation. In this paper we present a multi-classifier many-core processor combining the HMAX and SIFT approaches on a single chip. Through the combined approach, the system can: 1) pay attention to the target object directly with global context consideration, including complicated background or camouflaging obstacles, 2) utilize the super-resolution algorithm to recognize highly blurred or small size objects, and 3) recognize more than 200 objects in real-time by context-aware feature matching

    A 502GOPS and 0.984mW dual-mode ADAS SoC with RNN-FIS engine for intention prediction in automotive black-box system

    No full text
    Advanced driver-assistance systems (ADAS) are being adopted in automobiles for forward-collision warning, advanced emergency braking, adaptive cruise control, and lane-keeping assistance. Recently, automotive black boxes are installed in cars for tracking accidents or theft. In this paper, a dual-mode ADAS SoC is proposed to support both high-performance ADAS functionality in driving-mode (d-mode) and an ultra-low-power black box in parking-mode (p-mode). By operating in p-mode, surveillance recording can be triggered intelligently with the help of our intention-prediction engine (IPE), instead of always-on recording to extend battery life and prevent discharge

    A 502-GOPS and 0.984-mW Dual-Mode Intelligent ADAS SoC With Real-Time Semiglobal Matching and Intention Prediction for Smart Automotive Black Box System

    No full text
    The advanced driver assistance system (ADAS) for adaptive cruise control and collision avoidance is strongly dependent upon the robust image recognition technology such as lane detection, vehicle/pedestrian detection, and traffic sign recognition. However, the conventional ADAS cannot realize more advanced collision evasion in real environments due to the absence of intelligent vehicle/pedestrian behavior analysis. Moreover, accurate distance estimation is essential in ADAS applications and semiglobal matching (SGM) is most widely adopted for high accuracy, but its system-on-chip (SoC) implementation is difficult due to the massive external memory bandwidth. In this paper, an ADAS SoC with behavior analysis with Artificial Intelligence functions and hardware implementation of SGM is proposed. The proposed SoC has dual-mode operations of highperformance operation for intelligent ADAS with real-time SGM in D-Mode (d-mode) and ultralow-power operation for black box system in parking-mode. It features: 1) task-level pipelined SGM processor to reduce external memory bandwidth by 85.8%; 2) region-of-interest generation processor to reduce 86.2% of computation; 3) mixed-mode intention prediction engine for dualmode intelligence; and 4) dynamic voltage and frequency scaling control to save 36.2% of power in d-mode. The proposed ADAS processor achieves 862 GOPS/W energy efficiency and 31.4GOPS/ mm(2) area efficiency, which are 1.53x and 1.75x improvements than the state of the art, with 30 frames/s throughput under 720p stereo inputs

    A 1.22 TOPS and 1.52mW/MHz Augmented Reality Multi-Core Processor with Neural Network NoC for HMD Applications

    No full text
    Augmented reality (AR) is being investigated in advanced displays for the augmentation of images in a real-world environment. Wearable systems, such as head-mounted display (HMD) systems, have attempted to support real-time AR as a next generation UI/UX [1-2], but have failed, due to their limited computing power. In a prior work, a chip with limited AR functionality was reported that could perform AR with the help of markers placed in the environment (usually 1D or 2D bar codes) [3]. However, for a seamless visual experience, 3D objects should be rendered directly on the natural video image without any markers. Unlike marker-based AR, markerless AR requires natural feature extraction, general object recognition, 3D reconstruction, and camera-pose estimation to be performed in parallel. For instance, markerless AR for a VGA input-test video consumes ~1.3W power at 0.2fps throughput, with TI's OMAP4430, which exceeds power limits for wearable devices. Consequently, there is a need for a high-performance energy-efficient markerless AR processor to realize a real-time AR system, especially for HMD applications
    corecore