768 research outputs found
Deep-LK for Efficient Adaptive Object Tracking
In this paper we present a new approach for efficient regression based object
tracking which we refer to as Deep- LK. Our approach is closely related to the
Generic Object Tracking Using Regression Networks (GOTURN) framework of Held et
al. We make the following contributions. First, we demonstrate that there is a
theoretical relationship between siamese regression networks like GOTURN and
the classical Inverse-Compositional Lucas & Kanade (IC-LK) algorithm. Further,
we demonstrate that unlike GOTURN IC-LK adapts its regressor to the appearance
of the currently tracked frame. We argue that this missing property in GOTURN
can be attributed to its poor performance on unseen objects and/or viewpoints.
Second, we propose a novel framework for object tracking - which we refer to as
Deep-LK - that is inspired by the IC-LK framework. Finally, we show impressive
results demonstrating that Deep-LK substantially outperforms GOTURN.
Additionally, we demonstrate comparable tracking performance to current state
of the art deep-trackers whilst being an order of magnitude (i.e. 100 FPS)
computationally efficient
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion
Recent findings show that deep convolutional neural networks (DCNNs) do not
generalize well under partial occlusion. Inspired by the success of
compositional models at classifying partially occluded objects, we propose to
integrate compositional models and DCNNs into a unified deep model with innate
robustness to partial occlusion. We term this architecture Compositional
Convolutional Neural Network. In particular, we propose to replace the fully
connected classification head of a DCNN with a differentiable compositional
model. The generative nature of the compositional model enables it to localize
occluders and subsequently focus on the non-occluded parts of the object. We
conduct classification experiments on artificially occluded images as well as
real images of partially occluded objects from the MS-COCO dataset. The results
show that DCNNs do not classify occluded objects robustly, even when trained
with data that is strongly augmented with partial occlusions. Our proposed
model outperforms standard DCNNs by a large margin at classifying partially
occluded objects, even when it has not been exposed to occluded objects during
training. Additional experiments demonstrate that CompositionalNets can also
localize the occluders accurately, despite being trained with class labels
only. The code used in this work is publicly available.Comment: CVPR 2020; Code is available
https://github.com/AdamKortylewski/CompositionalNets; Supplementary material:
https://adamkortylewski.com/data/compnet_supp.pd
Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model
Amodal completion is a visual task that humans perform easily but which is
difficult for computer vision algorithms. The aim is to segment those object
boundaries which are occluded and hence invisible. This task is particularly
challenging for deep neural networks because data is difficult to obtain and
annotate. Therefore, we formulate amodal segmentation as an out-of-task and
out-of-distribution generalization problem. Specifically, we replace the fully
connected classifier in neural networks with a Bayesian generative model of the
neural network features. The model is trained from non-occluded images using
bounding box annotations and class labels only, but is applied to generalize
out-of-task to object segmentation and to generalize out-of-distribution to
segment occluded objects. We demonstrate how such Bayesian models can naturally
generalize beyond the training task labels when they learn a prior that models
the object's background context and shape. Moreover, by leveraging an outlier
process, Bayesian models can further generalize out-of-distribution to segment
partially occluded objects and to predict their amodal object boundaries. Our
algorithm outperforms alternative methods that use the same supervision by a
large margin, and even outperforms methods where annotated amodal segmentations
are used during training, when the amount of occlusion is large. Code is
publically available at https://github.com/YihongSun/Bayesian-Amodal
Siamese Instance Search for Tracking
In this paper we present a tracker, which is radically different from
state-of-the-art trackers: we apply no model updating, no occlusion detection,
no combination of trackers, no geometric matching, and still deliver
state-of-the-art tracking performance, as demonstrated on the popular online
tracking benchmark (OTB) and six very challenging YouTube videos. The presented
tracker simply matches the initial patch of the target in the first frame with
candidates in a new frame and returns the most similar patch by a learned
matching function. The strength of the matching function comes from being
extensively trained generically, i.e., without any data of the target, using a
Siamese deep neural network, which we design for tracking. Once learned, the
matching function is used as is, without any adapting, to track previously
unseen targets. It turns out that the learned matching function is so powerful
that a simple tracker built upon it, coined Siamese INstance search Tracker,
SINT, which only uses the original observation of the target from the first
frame, suffices to reach state-of-the-art performance. Further, we show the
proposed tracker even allows for target re-identification after the target was
absent for a complete video shot.Comment: This paper is accepted to the IEEE Conference on Computer Vision and
Pattern Recognition, 201
- …