17,995 research outputs found
Unconstrained Face Verification using Deep CNN Features
In this paper, we present an algorithm for unconstrained face verification
based on deep convolutional features and evaluate it on the newly released
IARPA Janus Benchmark A (IJB-A) dataset. The IJB-A dataset includes real-world
unconstrained faces from 500 subjects with full pose and illumination
variations which are much harder than the traditional Labeled Face in the Wild
(LFW) and Youtube Face (YTF) datasets. The deep convolutional neural network
(DCNN) is trained using the CASIA-WebFace dataset. Extensive experiments on the
IJB-A dataset are provided
Siamese Instance Search for Tracking
In this paper we present a tracker, which is radically different from
state-of-the-art trackers: we apply no model updating, no occlusion detection,
no combination of trackers, no geometric matching, and still deliver
state-of-the-art tracking performance, as demonstrated on the popular online
tracking benchmark (OTB) and six very challenging YouTube videos. The presented
tracker simply matches the initial patch of the target in the first frame with
candidates in a new frame and returns the most similar patch by a learned
matching function. The strength of the matching function comes from being
extensively trained generically, i.e., without any data of the target, using a
Siamese deep neural network, which we design for tracking. Once learned, the
matching function is used as is, without any adapting, to track previously
unseen targets. It turns out that the learned matching function is so powerful
that a simple tracker built upon it, coined Siamese INstance search Tracker,
SINT, which only uses the original observation of the target from the first
frame, suffices to reach state-of-the-art performance. Further, we show the
proposed tracker even allows for target re-identification after the target was
absent for a complete video shot.Comment: This paper is accepted to the IEEE Conference on Computer Vision and
Pattern Recognition, 201
Unsupervised Learning of Visual Representations using Videos
Is strong supervision necessary for learning a good visual representation? Do
we really need millions of semantically-labeled images to train a Convolutional
Neural Network (CNN)? In this paper, we present a simple yet surprisingly
powerful approach for unsupervised learning of CNN. Specifically, we use
hundreds of thousands of unlabeled videos from the web to learn visual
representations. Our key idea is that visual tracking provides the supervision.
That is, two patches connected by a track should have similar visual
representation in deep feature space since they probably belong to the same
object or object part. We design a Siamese-triplet network with a ranking loss
function to train this CNN representation. Without using a single image from
ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train
an ensemble of unsupervised networks that achieves 52% mAP (no bounding box
regression). This performance comes tantalizingly close to its
ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4%. We
also show that our unsupervised network can perform competitively in other
tasks such as surface-normal estimation
- …