48,040 research outputs found

    EVA2^2: Exploiting Temporal Redundancy in Live Computer Vision

    Full text link
    Hardware support for deep convolutional neural networks (CNNs) is critical to advanced computer vision in mobile and embedded devices. Current designs, however, accelerate generic CNNs; they do not exploit the unique characteristics of real-time vision. We propose to use the temporal redundancy in natural video to avoid unnecessary computation on most frames. A new algorithm, activation motion compensation, detects changes in the visual input and incrementally updates a previously-computed output. The technique takes inspiration from video compression and applies well-known motion estimation techniques to adapt to visual changes. We use an adaptive key frame rate to control the trade-off between efficiency and vision quality as the input changes. We implement the technique in hardware as an extension to existing state-of-the-art CNN accelerator designs. The new unit reduces the average energy per frame by 54.2%, 61.7%, and 87.6% for three CNNs with less than 1% loss in vision accuracy.Comment: Appears in ISCA 201

    Down-Scaling with Learned Kernels in Multi-Scale Deep Neural Networks for Non-Uniform Single Image Deblurring

    Full text link
    Multi-scale approach has been used for blind image / video deblurring problems to yield excellent performance for both conventional and recent deep-learning-based state-of-the-art methods. Bicubic down-sampling is a typical choice for multi-scale approach to reduce spatial dimension after filtering with a fixed kernel. However, this fixed kernel may be sub-optimal since it may destroy important information for reliable deblurring such as strong edges. We propose convolutional neural network (CNN)-based down-scale methods for multi-scale deep-learning-based non-uniform single image deblurring. We argue that our CNN-based down-scaling effectively reduces the spatial dimension of the original image, while learned kernels with multiple channels may well-preserve necessary details for deblurring tasks. For each scale, we adopt to use RCAN (Residual Channel Attention Networks) as a backbone network to further improve performance. Our proposed method yielded state-of-the-art performance on GoPro dataset by large margin. Our proposed method was able to achieve 2.59dB higher PSNR than the current state-of-the-art method by Tao. Our proposed CNN-based down-scaling was the key factor for this excellent performance since the performance of our network without it was decreased by 1.98dB. The same networks trained with GoPro set were also evaluated on large-scale Su dataset and our proposed method yielded 1.15dB better PSNR than the Tao's method. Qualitative comparisons on Lai dataset also confirmed the superior performance of our proposed method over other state-of-the-art methods.Comment: 10 pages, 7 figures, 4 table

    Proposal For Neuromorphic Hardware Using Spin Devices

    Full text link
    We present a design-scheme for ultra-low power neuromorphic hardware using emerging spin-devices. We propose device models for 'neuron', based on lateral spin valves and domain wall magnets that can operate at ultra-low terminal voltage of ~20 mV, resulting in small computation energy. Magnetic tunnel junctions are employed for interfacing the spin-neurons with charge-based devices like CMOS, for large-scale networks. Device-circuit co-simulation-framework is used for simulating such hybrid designs, in order to evaluate system-level performance. We present the design of different classes of neuromorphic architectures using the proposed scheme that can be suitable for different applications like, analog-data-sensing, data-conversion, cognitive-computing, associative memory, programmable-logic and analog and digital signal processing. We show that the spin-based neuromorphic designs can achieve 15X-300X lower computation energy for these applications; as compared to state of art CMOS designs

    Cubes3D: Neural Network based Optical Flow in Omnidirectional Image Scenes

    Full text link
    Optical flow estimation with convolutional neural networks (CNNs) has recently solved various tasks of computer vision successfully. In this paper we adapt a state-of-the-art approach for optical flow estimation to omnidirectional images. We investigate CNN architectures to determine high motion variations caused by the geometry of fish-eye images. Further we determine the qualitative influence of texture on the non-rigid object to the motion vectors. For evaluation of the results we create ground truth motion fields synthetically. The ground truth contains cubes with static background. We test variations of pre-trained FlowNet 2.0 architectures by indicating common error metrics. We generate competitive results for the motion of the foreground with inhomogeneous texture on the moving object.Comment: ICPRAI 201

    HFirst: A Temporal Approach to Object Recognition

    Full text link
    This paper introduces a spiking hierarchical model for object recognition which utilizes the precise timing information inherently present in the output of biologically inspired asynchronous Address Event Representation (AER) vision sensors. The asynchronous nature of these systems frees computation and communication from the rigid predetermined timing enforced by system clocks in conventional systems. Freedom from rigid timing constraints opens the possibility of using true timing to our advantage in computation. We show not only how timing can be used in object recognition, but also how it can in fact simplify computation. Specifically, we rely on a simple temporal-winner-take-all rather than more computationally intensive synchronous operations typically used in biologically inspired neural networks for object recognition. This approach to visual computation represents a major paradigm shift from conventional clocked systems and can find application in other sensory modalities and computational tasks. We showcase effectiveness of the approach by achieving the highest reported accuracy to date (97.5\%±\pm3.5\%) for a previously published four class card pip recognition task and an accuracy of 84.9\%±\pm1.9\% for a new more difficult 36 class character recognition task.Comment: 13 pages, 10 figure

    Bioinspired Visual Motion Estimation

    Full text link
    Visual motion estimation is a computationally intensive, but important task for sighted animals. Replicating the robustness and efficiency of biological visual motion estimation in artificial systems would significantly enhance the capabilities of future robotic agents. 25 years ago, in this very journal, Carver Mead outlined his argument for replicating biological processing in silicon circuits. His vision served as the foundation for the field of neuromorphic engineering, which has experienced a rapid growth in interest over recent years as the ideas and technologies mature. Replicating biological visual sensing was one of the first tasks attempted in the neuromorphic field. In this paper we focus specifically on the task of visual motion estimation. We describe the task itself, present the progression of works from the early first attempts through to the modern day state-of-the-art, and provide an outlook for future directions in the field.Comment: 16 pages, 11 figures, 1 tabl

    Depth-Adaptive Computational Policies for Efficient Visual Tracking

    Full text link
    Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame. However, it is harder to track an object in some frames than others, due to the varying amount of clutter, scene complexity, amount of motion, and object's distinctiveness against its background. We propose a depth-adaptive convolutional Siamese network that performs video tracking adaptively at multiple neural network depths. Parametric gating functions are trained to control the depth of the convolutional feature extractor by minimizing a joint loss of computational cost and tracking error. Our network achieves accuracy comparable to the state-of-the-art on the VOT2016 benchmark. Furthermore, our adaptive depth computation achieves higher accuracy for a given computational cost than traditional fixed-structure neural networks. The presented framework extends to other tasks that use convolutional neural networks and enables trading speed for accuracy at runtime.Comment: presented at EMMCVPR 2017 in Venice, Ital

    FutureMapping: The Computational Structure of Spatial AI Systems

    Full text link
    We discuss and predict the evolution of Simultaneous Localisation and Mapping (SLAM) into a general geometric and semantic `Spatial AI' perception capability for intelligent embodied devices. A big gap remains between the visual perception performance that devices such as augmented reality eyewear or comsumer robots will require and what is possible within the constraints imposed by real products. Co-design of algorithms, processors and sensors will be needed. We explore the computational structure of current and future Spatial AI algorithms and consider this within the landscape of ongoing hardware developments

    Learning Image Matching by Simply Watching Video

    Full text link
    This work presents an unsupervised learning based approach to the ubiquitous computer vision problem of image matching. We start from the insight that the problem of frame-interpolation implicitly solves for inter-frame correspondences. This permits the application of analysis-by-synthesis: we firstly train and apply a Convolutional Neural Network for frame-interpolation, then obtain correspondences by inverting the learned CNN. The key benefit behind this strategy is that the CNN for frame-interpolation can be trained in an unsupervised manner by exploiting the temporal coherency that is naturally contained in real-world video sequences. The present model therefore learns image matching by simply watching videos. Besides a promise to be more generally applicable, the presented approach achieves surprising performance comparable to traditional empirically designed methods.Comment: The second version contains additional quantitative evaluation of frame interpolatio

    3D Scene Geometry-Aware Constraint for Camera Localization with Deep Learning

    Full text link
    Camera localization is a fundamental and key component of autonomous driving vehicles and mobile robots to localize themselves globally for further environment perception, path planning and motion control. Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods. In this work, we propose a compact network for absolute camera pose regression. Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents. We add this constraint as a regularization term to our proposed network by defining a pixel-level photometric loss and an image-level structural similarity loss. To benchmark our method, different challenging scenes including indoor and outdoor environment are tested with our proposed approach and state-of-the-arts. And the experimental results demonstrate significant performance improvement of our method on both prediction accuracy and convergence efficiency.Comment: Accepted for ICRA 202
    • …
    corecore