48,040 research outputs found
EVA: Exploiting Temporal Redundancy in Live Computer Vision
Hardware support for deep convolutional neural networks (CNNs) is critical to
advanced computer vision in mobile and embedded devices. Current designs,
however, accelerate generic CNNs; they do not exploit the unique
characteristics of real-time vision. We propose to use the temporal redundancy
in natural video to avoid unnecessary computation on most frames. A new
algorithm, activation motion compensation, detects changes in the visual input
and incrementally updates a previously-computed output. The technique takes
inspiration from video compression and applies well-known motion estimation
techniques to adapt to visual changes. We use an adaptive key frame rate to
control the trade-off between efficiency and vision quality as the input
changes. We implement the technique in hardware as an extension to existing
state-of-the-art CNN accelerator designs. The new unit reduces the average
energy per frame by 54.2%, 61.7%, and 87.6% for three CNNs with less than 1%
loss in vision accuracy.Comment: Appears in ISCA 201
Down-Scaling with Learned Kernels in Multi-Scale Deep Neural Networks for Non-Uniform Single Image Deblurring
Multi-scale approach has been used for blind image / video deblurring
problems to yield excellent performance for both conventional and recent
deep-learning-based state-of-the-art methods. Bicubic down-sampling is a
typical choice for multi-scale approach to reduce spatial dimension after
filtering with a fixed kernel. However, this fixed kernel may be sub-optimal
since it may destroy important information for reliable deblurring such as
strong edges. We propose convolutional neural network (CNN)-based down-scale
methods for multi-scale deep-learning-based non-uniform single image
deblurring. We argue that our CNN-based down-scaling effectively reduces the
spatial dimension of the original image, while learned kernels with multiple
channels may well-preserve necessary details for deblurring tasks. For each
scale, we adopt to use RCAN (Residual Channel Attention Networks) as a backbone
network to further improve performance. Our proposed method yielded
state-of-the-art performance on GoPro dataset by large margin. Our proposed
method was able to achieve 2.59dB higher PSNR than the current state-of-the-art
method by Tao. Our proposed CNN-based down-scaling was the key factor for this
excellent performance since the performance of our network without it was
decreased by 1.98dB. The same networks trained with GoPro set were also
evaluated on large-scale Su dataset and our proposed method yielded 1.15dB
better PSNR than the Tao's method. Qualitative comparisons on Lai dataset also
confirmed the superior performance of our proposed method over other
state-of-the-art methods.Comment: 10 pages, 7 figures, 4 table
Proposal For Neuromorphic Hardware Using Spin Devices
We present a design-scheme for ultra-low power neuromorphic hardware using
emerging spin-devices. We propose device models for 'neuron', based on lateral
spin valves and domain wall magnets that can operate at ultra-low terminal
voltage of ~20 mV, resulting in small computation energy. Magnetic tunnel
junctions are employed for interfacing the spin-neurons with charge-based
devices like CMOS, for large-scale networks. Device-circuit
co-simulation-framework is used for simulating such hybrid designs, in order to
evaluate system-level performance. We present the design of different classes
of neuromorphic architectures using the proposed scheme that can be suitable
for different applications like, analog-data-sensing, data-conversion,
cognitive-computing, associative memory, programmable-logic and analog and
digital signal processing. We show that the spin-based neuromorphic designs can
achieve 15X-300X lower computation energy for these applications; as compared
to state of art CMOS designs
Cubes3D: Neural Network based Optical Flow in Omnidirectional Image Scenes
Optical flow estimation with convolutional neural networks (CNNs) has
recently solved various tasks of computer vision successfully. In this paper we
adapt a state-of-the-art approach for optical flow estimation to
omnidirectional images. We investigate CNN architectures to determine high
motion variations caused by the geometry of fish-eye images. Further we
determine the qualitative influence of texture on the non-rigid object to the
motion vectors. For evaluation of the results we create ground truth motion
fields synthetically. The ground truth contains cubes with static background.
We test variations of pre-trained FlowNet 2.0 architectures by indicating
common error metrics. We generate competitive results for the motion of the
foreground with inhomogeneous texture on the moving object.Comment: ICPRAI 201
HFirst: A Temporal Approach to Object Recognition
This paper introduces a spiking hierarchical model for object recognition
which utilizes the precise timing information inherently present in the output
of biologically inspired asynchronous Address Event Representation (AER) vision
sensors. The asynchronous nature of these systems frees computation and
communication from the rigid predetermined timing enforced by system clocks in
conventional systems. Freedom from rigid timing constraints opens the
possibility of using true timing to our advantage in computation. We show not
only how timing can be used in object recognition, but also how it can in fact
simplify computation. Specifically, we rely on a simple
temporal-winner-take-all rather than more computationally intensive synchronous
operations typically used in biologically inspired neural networks for object
recognition. This approach to visual computation represents a major paradigm
shift from conventional clocked systems and can find application in other
sensory modalities and computational tasks. We showcase effectiveness of the
approach by achieving the highest reported accuracy to date (97.5\%3.5\%)
for a previously published four class card pip recognition task and an accuracy
of 84.9\%1.9\% for a new more difficult 36 class character recognition
task.Comment: 13 pages, 10 figure
Bioinspired Visual Motion Estimation
Visual motion estimation is a computationally intensive, but important task
for sighted animals. Replicating the robustness and efficiency of biological
visual motion estimation in artificial systems would significantly enhance the
capabilities of future robotic agents. 25 years ago, in this very journal,
Carver Mead outlined his argument for replicating biological processing in
silicon circuits. His vision served as the foundation for the field of
neuromorphic engineering, which has experienced a rapid growth in interest over
recent years as the ideas and technologies mature. Replicating biological
visual sensing was one of the first tasks attempted in the neuromorphic field.
In this paper we focus specifically on the task of visual motion estimation. We
describe the task itself, present the progression of works from the early first
attempts through to the modern day state-of-the-art, and provide an outlook for
future directions in the field.Comment: 16 pages, 11 figures, 1 tabl
Depth-Adaptive Computational Policies for Efficient Visual Tracking
Current convolutional neural networks algorithms for video object tracking
spend the same amount of computation for each object and video frame. However,
it is harder to track an object in some frames than others, due to the varying
amount of clutter, scene complexity, amount of motion, and object's
distinctiveness against its background. We propose a depth-adaptive
convolutional Siamese network that performs video tracking adaptively at
multiple neural network depths. Parametric gating functions are trained to
control the depth of the convolutional feature extractor by minimizing a joint
loss of computational cost and tracking error. Our network achieves accuracy
comparable to the state-of-the-art on the VOT2016 benchmark. Furthermore, our
adaptive depth computation achieves higher accuracy for a given computational
cost than traditional fixed-structure neural networks. The presented framework
extends to other tasks that use convolutional neural networks and enables
trading speed for accuracy at runtime.Comment: presented at EMMCVPR 2017 in Venice, Ital
FutureMapping: The Computational Structure of Spatial AI Systems
We discuss and predict the evolution of Simultaneous Localisation and Mapping
(SLAM) into a general geometric and semantic `Spatial AI' perception capability
for intelligent embodied devices. A big gap remains between the visual
perception performance that devices such as augmented reality eyewear or
comsumer robots will require and what is possible within the constraints
imposed by real products. Co-design of algorithms, processors and sensors will
be needed. We explore the computational structure of current and future Spatial
AI algorithms and consider this within the landscape of ongoing hardware
developments
Learning Image Matching by Simply Watching Video
This work presents an unsupervised learning based approach to the ubiquitous
computer vision problem of image matching. We start from the insight that the
problem of frame-interpolation implicitly solves for inter-frame
correspondences. This permits the application of analysis-by-synthesis: we
firstly train and apply a Convolutional Neural Network for frame-interpolation,
then obtain correspondences by inverting the learned CNN. The key benefit
behind this strategy is that the CNN for frame-interpolation can be trained in
an unsupervised manner by exploiting the temporal coherency that is naturally
contained in real-world video sequences. The present model therefore learns
image matching by simply watching videos. Besides a promise to be more
generally applicable, the presented approach achieves surprising performance
comparable to traditional empirically designed methods.Comment: The second version contains additional quantitative evaluation of
frame interpolatio
3D Scene Geometry-Aware Constraint for Camera Localization with Deep Learning
Camera localization is a fundamental and key component of autonomous driving
vehicles and mobile robots to localize themselves globally for further
environment perception, path planning and motion control. Recently end-to-end
approaches based on convolutional neural network have been much studied to
achieve or even exceed 3D-geometry based traditional methods. In this work, we
propose a compact network for absolute camera pose regression. Inspired from
those traditional methods, a 3D scene geometry-aware constraint is also
introduced by exploiting all available information including motion, depth and
image contents. We add this constraint as a regularization term to our proposed
network by defining a pixel-level photometric loss and an image-level
structural similarity loss. To benchmark our method, different challenging
scenes including indoor and outdoor environment are tested with our proposed
approach and state-of-the-arts. And the experimental results demonstrate
significant performance improvement of our method on both prediction accuracy
and convergence efficiency.Comment: Accepted for ICRA 202
- …