16 research outputs found

    A 1.22 TOPS and 1.52 mW/MHz Augmented Reality Multicore Processor With Neural Network NoC for HMD Applications

    No full text
    Real-time augmented reality (AR) is actively studied for the future user interface and experience in high-performance head-mounted display (HMD) systems. The small battery size and limited computing power of the current HMD, however, fail to implement the real-time markerless AR in the HMD. In this paper, we propose a real-time and low-power AR processor for advanced 3D-AR HMD applications. For the high throughput, the processor adopts task-level pipelined SIMD-PE clusters and a congestion-aware network-on-chip (NoC). Both of these two features exploit the high data-level parallelism (DLP) and task-level parallelism (TLP) with the pipelined multicore architecture. For the low power consumption, it employs a vocabulary forest accelerator and a mixed-mode support vector machine (SVM)-based DVFS control to reduce unnecessary external memory accesses and core activation. The proposed 4 mm 8 mm HMD AR processor is fabricated using 65 nm CMOS technology for a battery-powered HMD platform with real-time AR operation. It consumes 381 mW average power and 778 mW peak power at 250 MHz operating frequency and 1.2 V supply voltage. It achieves 1.22 TOPS peak performance and 1.57 TOPS/W energy efficiency, which are, respectively, and higher than the state of the art

    An Augmented Reality Processor with a Congestion-Aware Network-onChip Scheduler

    No full text
    For a markerless augmented reality system that can operate all day, the authors implemented a low-power Basic On-Chip Network-Augmented Reality (BONE-AR) processor to execute object recognition, camera pose estimation, and 3D graphics rendering in real time for an HD resolution video input. BONE-AR employs six clusters of heterogeneous SIMD processors distributed on the mesh topology network on a chip (NoC) to exploit data- and task-level parallelism. A visual attention algorithm reduces overall workload by removing background clutters from the input video frames, but also incurs NoC congestion because of a dynamically fluctuating workload. The authors propose a congestion-aware scheduler that detects and resolves the NoC congestion to prevent throughput degradation of a task-level pipeline

    A 646GOPS/W multi-classifier many-core processor with cortex-like architecture for super-resolution recognition

    No full text
    Object recognition processors have been reported for the applications of autonomic vehicle navigation, smart surveillance and unmanned air vehicles (UAVs) [1-3]. Most of the processors adopt a single classifier rather than multiple classifiers even though multi-classifier systems (MCSs) offer more accurate recognition with higher robustness [4]. In addition, MCSs can incorporate the human vision system (HVS) recognition architecture to reduce computational requirements and enhance recognition accuracy. For example, HMAX models the exact hierarchical architecture of the HVS for improved recognition accuracy [5]. Compared with SIFT, known to have the best recognition accuracy based on local features extracted from the object [6], HMAX can recognize an object based on global features by template matching and a maximum-pooling operation without feature segmentation. In this paper we present a multi-classifier many-core processor combining the HMAX and SIFT approaches on a single chip. Through the combined approach, the system can: 1) pay attention to the target object directly with global context consideration, including complicated background or camouflaging obstacles, 2) utilize the super-resolution algorithm to recognize highly blurred or small size objects, and 3) recognize more than 200 objects in real-time by context-aware feature matching

    A multi-granularity parallelism object recognition processor with content-aware fine-grained task scheduling

    No full text
    Multiple granularity parallel core architecture is proposed to accelerate object recognition with low area and energy consumption. By adopting task-level optimized cores with different parallelism and complexity, the proposed processor achieves real-time object recognition with 271.4 GOPS peak performance. In addition, content-aware fine-grained task scheduling is proposed to enable low power real-time object recognition on 30fps 720p HD video streams. As a result, the object recognition processor achieves 9.4nJ/pixel energy efficiency and 25.8 GOPS/W??mm 2 power-area efficiency in O.13um CMOS technology

    A keypoint-level parallel pipelined object recognition processor with gaze activation image sensor for mobile smart glasses system

    No full text
    In this paper, a low-power real-time gaze-activated object recognition processor is proposed for a battery-powered smart glasses system. For high energy efficiency, we propose keypoint-level pipelined architecture to increase the hardware utilziation which results in significant power reduction of the real-time recognition processor. In addition, low-power gaze-activation image sensor with mixed-mode architecture is proposed for the glass user's gaze estimation. Therefore, only the small image region where the glasses user is seeing needs to be processed by the recognition processor leading to further power reduction. As a result, the proposed object recognition processor shows 30fps real-time performance only with 75mW power consumption, which is 3.5x and 4.4x smaller power than the state-of-the-art works

    A 2.71nJ/pixel 3D-stacked gaze-activated object-recognition system for low-power mobile HMD applications

    No full text
    Smart eyeglasses or head-mounted displays (HMDs) have been gaining traction as next-generation mainstream wearable devices. However, previous HMD systems [1] have had limited application, primarily due to their lacking a smart user interface (Ul) and user experience (UX). Since HMD systems have a small compact wearable platform, their Ul requires new modalities, rather than a computer mouse or a 2D touch panel. Recent speech-recognition-based Uls require voice input to reveal the user's intention to not only HMD users but also others, which raises privacy concerns in a public space. In addition, prior works [2-3] attempted to support object recognition (OR) or augmented reality (AR) in smart eyeglasses, but consumed considerable power, >381mW, resulting in <;6 hours operation time with a 2100mWh battery

    A 2.71 nJ/Pixel Gaze-Activated Object Recognition System for Low-Power Mobile Smart Glasses

    No full text
    A low-power object recognition (OR) system with intuitive gaze user interface (UI) is proposed for battery-powered smart glasses. For low-power gaze UI, we propose a low-power single-chip gaze estimation sensor, called gaze image sensor (GIS). In GIS, a novel column-parallel pupil edge detection circuit (PEDC) with new pupil edge detection algorithm XY pupil detection (XY-PD) is proposed which results in 2.9x power reduction with 16x larger resolution compared to previous work. Also, a logarithmic SIMD processor is proposed for robust pupil center estimation, <1 pixel error, with low-power floating-point implementation. For OR, low-power multicore OR processor (ORP) is implemented. In ORP, task-level pipeline with keypoint-level scoring is proposed to reduce the number of cores as well as the operating frequency of keypoint-matching processor (KMP) for low-power consumption. Also, dual-mode convolutional neural network processor (CNNP) is designed for fast tile selection without external memory accesses. In addition, a pipelined descriptor generation processor (DGP) with LUT-based nonlinear operation is newly proposed for low-power OR. Lastly, dynamic voltage and frequency scaling (DVFS) for dynamic power reduction in ORP is applied. Combining both of the GIS and ORP fabricated in 65 nm CMOS logic process, only 75 mW average power consumption is achieved with real-time OR performance, which is 1.2x and 4.4x lower power than the previously published work

    A 1.22 TOPS and 1.52mW/MHz Augmented Reality Multi-Core Processor with Neural Network NoC for HMD Applications

    No full text
    Augmented reality (AR) is being investigated in advanced displays for the augmentation of images in a real-world environment. Wearable systems, such as head-mounted display (HMD) systems, have attempted to support real-time AR as a next generation UI/UX [1-2], but have failed, due to their limited computing power. In a prior work, a chip with limited AR functionality was reported that could perform AR with the help of markers placed in the environment (usually 1D or 2D bar codes) [3]. However, for a seamless visual experience, 3D objects should be rendered directly on the natural video image without any markers. Unlike marker-based AR, markerless AR requires natural feature extraction, general object recognition, 3D reconstruction, and camera-pose estimation to be performed in parallel. For instance, markerless AR for a VGA input-test video consumes ~1.3W power at 0.2fps throughput, with TI&apos;s OMAP4430, which exceeds power limits for wearable devices. Consequently, there is a need for a high-performance energy-efficient markerless AR processor to realize a real-time AR system, especially for HMD applications

    A task-level pipelined many-SIMD augmented reality processor with congestion-aware network-on-chip scheduler

    No full text
    A 36 Heterogeneous multicore processor is proposed to accelerate recognition-based markerless augmented reality. To enable a real-time operation of the proposed augmented reality, task-level pipelined multicore architecture with DLP/TLP optimized SIMD processing elements is implemented. In addition, the multicore employs a congestion-aware network-on-chip scheduler for 2D-mesh network-on-chip to support massive internal data transaction caused by task-level pipeline. As a result, it achieves 1.22TOPS peak performance and 1.57TOPS/W energy-efficiency, which are 88% and 76% improvement over a state-of-the-art augmented reality processor, for 30fps 720p test input video
    corecore