112,853 research outputs found
Memory Based Online Learning of Deep Representations from Video Streams
We present a novel online unsupervised method for face identity learning from
video streams. The method exploits deep face descriptors together with a memory
based learning mechanism that takes advantage of the temporal coherence of
visual data. Specifically, we introduce a discriminative feature matching
solution based on Reverse Nearest Neighbour and a feature forgetting strategy
that detect redundant features and discard them appropriately while time
progresses. It is shown that the proposed learning procedure is asymptotically
stable and can be effectively used in relevant applications like multiple face
identification and tracking from unconstrained video streams. Experimental
results show that the proposed method achieves comparable results in the task
of multiple face tracking and better performance in face identification with
offline approaches exploiting future information. Code will be publicly
available.Comment: arXiv admin note: text overlap with arXiv:1708.0361
Building with Drones: Accurate 3D Facade Reconstruction using MAVs
Automatic reconstruction of 3D models from images using multi-view
Structure-from-Motion methods has been one of the most fruitful outcomes of
computer vision. These advances combined with the growing popularity of Micro
Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools
ubiquitous for large number of Architecture, Engineering and Construction
applications among audiences, mostly unskilled in computer vision. However, to
obtain high-resolution and accurate reconstructions from a large-scale object
using SfM, there are many critical constraints on the quality of image data,
which often become sources of inaccuracy as the current 3D reconstruction
pipelines do not facilitate the users to determine the fidelity of input data
during the image acquisition. In this paper, we present and advocate a
closed-loop interactive approach that performs incremental reconstruction in
real-time and gives users an online feedback about the quality parameters like
Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We
also propose a novel multi-scale camera network design to prevent scene drift
caused by incremental map building, and release the first multi-scale image
sequence dataset as a benchmark. Further, we evaluate our system on real
outdoor scenes, and show that our interactive pipeline combined with a
multi-scale camera network approach provides compelling accuracy in multi-view
reconstruction tasks when compared against the state-of-the-art methods.Comment: 8 Pages, 2015 IEEE International Conference on Robotics and
Automation (ICRA '15), Seattle, WA, US
ODN: Opening the Deep Network for Open-set Action Recognition
In recent years, the performance of action recognition has been significantly
improved with the help of deep neural networks. Most of the existing action
recognition works hold the \textit{closed-set} assumption that all action
categories are known beforehand while deep networks can be well trained for
these categories. However, action recognition in the real world is essentially
an \textit{open-set} problem, namely, it is impossible to know all action
categories beforehand and consequently infeasible to prepare sufficient
training samples for those emerging categories. In this case, applying
closed-set recognition methods will definitely lead to unseen-category errors.
To address this challenge, we propose the Open Deep Network (ODN) for the
open-set action recognition task. Technologically, ODN detects new categories
by applying a multi-class triplet thresholding method, and then dynamically
reconstructs the classification layer and "opens" the deep network by adding
predictors for new categories continually. In order to transfer the learned
knowledge to the new category, two novel methods, Emphasis Initialization and
Allometry Training, are adopted to initialize and incrementally train the new
predictor so that only few samples are needed to fine-tune the model. Extensive
experiments show that ODN can effectively detect and recognize new categories
with little human intervention, thus applicable to the open-set action
recognition tasks in the real world. Moreover, ODN can even achieve comparable
performance to some closed-set methods.Comment: 6 pages, 3 figures, ICME 201
Bags of Affine Subspaces for Robust Object Tracking
We propose an adaptive tracking algorithm where the object is modelled as a
continuously updated bag of affine subspaces, with each subspace constructed
from the object's appearance over several consecutive frames. In contrast to
linear subspaces, affine subspaces explicitly model the origin of subspaces.
Furthermore, instead of using a brittle point-to-subspace distance during the
search for the object in a new frame, we propose to use a subspace-to-subspace
distance by representing candidate image areas also as affine subspaces.
Distances between subspaces are then obtained by exploiting the non-Euclidean
geometry of Grassmann manifolds. Experiments on challenging videos (containing
object occlusions, deformations, as well as variations in pose and
illumination) indicate that the proposed method achieves higher tracking
accuracy than several recent discriminative trackers.Comment: in International Conference on Digital Image Computing: Techniques
and Applications, 201
iCaRL: Incremental Classifier and Representation Learning
A major open problem on the road to artificial intelligence is the
development of incrementally learning systems that learn about more and more
concepts over time from a stream of data. In this work, we introduce a new
training strategy, iCaRL, that allows learning in such a class-incremental way:
only the training data for a small number of classes has to be present at the
same time and new classes can be added progressively. iCaRL learns strong
classifiers and a data representation simultaneously. This distinguishes it
from earlier works that were fundamentally limited to fixed data
representations and therefore incompatible with deep learning architectures. We
show by experiments on CIFAR-100 and ImageNet ILSVRC 2012 data that iCaRL can
learn many classes incrementally over a long period of time where other
strategies quickly fail.Comment: Accepted paper at CVPR 201
- …