11,928 research outputs found
Recurrent Autoregressive Networks for Online Multi-Object Tracking
The main challenge of online multi-object tracking is to reliably associate
object trajectories with detections in each video frame based on their tracking
history. In this work, we propose the Recurrent Autoregressive Network (RAN), a
temporal generative modeling framework to characterize the appearance and
motion dynamics of multiple objects over time. The RAN couples an external
memory and an internal memory. The external memory explicitly stores previous
inputs of each trajectory in a time window, while the internal memory learns to
summarize long-term tracking history and associate detections by processing the
external memory. We conduct experiments on the MOT 2015 and 2016 datasets to
demonstrate the robustness of our tracking method in highly crowded and
occluded scenes. Our method achieves top-ranked results on the two benchmarks.Comment: 10 pages, 3 figures, 6 table
Recurrent Filter Learning for Visual Tracking
Recently using convolutional neural networks (CNNs) has gained popularity in
visual tracking, due to its robust feature representation of images. Recent
methods perform online tracking by fine-tuning a pre-trained CNN model to the
specific target object using stochastic gradient descent (SGD)
back-propagation, which is usually time-consuming. In this paper, we propose a
recurrent filter generation methods for visual tracking. We directly feed the
target's image patch to a recurrent neural network (RNN) to estimate an
object-specific filter for tracking. As the video sequence is a spatiotemporal
data, we extend the matrix multiplications of the fully-connected layers of the
RNN to a convolution operation on feature maps, which preserves the target's
spatial structure and also is memory-efficient. The tracked object in the
subsequent frames will be fed into the RNN to adapt the generated filters to
appearance variations of the target. Note that once the off-line training
process of our network is finished, there is no need to fine-tune the network
for specific objects, which makes our approach more efficient than methods that
use iterative fine-tuning to online learn the target. Extensive experiments
conducted on widely used benchmarks, OTB and VOT, demonstrate encouraging
results compared to other recent methods.Comment: ICCV2017 Workshop on VO
Deep Recurrent Neural Network for Multi-target Filtering
This paper addresses the problem of fixed motion and measurement models for
multi-target filtering using an adaptive learning framework. This is performed
by defining target tuples with random finite set terminology and utilisation of
recurrent neural networks with a long short-term memory architecture. A novel
data association algorithm compatible with the predicted tracklet tuples is
proposed, enabling the update of occluded targets, in addition to assigning
birth, survival and death of targets. The algorithm is evaluated over a
commonly used filtering simulation scenario, with highly promising results.Comment: The 25th International Conference on MultiMedia Modeling (MMM
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
This survey presents a deep analysis of the learning and inference
capabilities in nine popular trackers. It is neither intended to study the
whole literature nor is it an attempt to review all kinds of neural networks
proposed for visual tracking. We focus instead on Siamese neural networks which
are a promising starting point for studying the challenging problem of
tracking. These networks integrate efficiently feature learning and the
temporal matching and have so far shown state-of-the-art performance. In
particular, the branches of Siamese networks, their layers connecting these
branches, specific aspects of training and the embedding of these networks into
the tracker are highlighted. Quantitative results from existing papers are
compared with the conclusion that the current evaluation methodology shows
problems with the reproducibility and the comparability of results. The paper
proposes a novel Lisp-like formalism for a better comparison of trackers. This
assumes a certain functional design and functional decomposition of trackers.
The paper tries to give foundation for tracker design by a formulation of the
problem based on the theory of machine learning and by the interpretation of a
tracker as a decision function. The work concludes with promising lines of
research and suggests future work.Comment: submitted to IEEE TPAM
RATM: Recurrent Attentive Tracking Model
We present an attention-based modular neural framework for computer vision.
The framework uses a soft attention mechanism allowing models to be trained
with gradient descent. It consists of three modules: a recurrent attention
module controlling where to look in an image or video frame, a
feature-extraction module providing a representation of what is seen, and an
objective module formalizing why the model learns its attentive behavior. The
attention module allows the model to focus computation on task-related
information in the input. We apply the framework to several object tracking
tasks and explore various design choices. We experiment with three data sets,
bouncing ball, moving digits and the real-world KTH data set. The proposed
Recurrent Attentive Tracking Model performs well on all three tasks and can
generalize to related but previously unseen sequences from a challenging
tracking data set
First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks
In this paper, we propose and study a novel visual object tracking approach
based on convolutional networks and recurrent networks. The proposed approach
is distinct from the existing approaches to visual object tracking, such as
filtering-based ones and tracking-by-detection ones, in the sense that the
tracking system is explicitly trained off-line to track anonymous objects in a
noisy environment. The proposed visual tracking model is end-to-end trainable,
minimizing any adversarial effect from mismatches in object representation and
between the true underlying dynamics and learning dynamics. We empirically show
that the proposed tracking approach works well in various scenarios by
generating artificial video sequences with varying conditions; the number of
objects, amount of noise and the match between the training shapes and test
shapes
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking
In this paper, we develop a new approach of spatially supervised recurrent
convolutional neural networks for visual object tracking. Our recurrent
convolutional network exploits the history of locations as well as the
distinctive visual features learned by the deep neural networks. Inspired by
recent bounding box regression methods for object detection, we study the
regression capability of Long Short-Term Memory (LSTM) in the temporal domain,
and propose to concatenate high-level visual features produced by convolutional
networks with region information. In contrast to existing deep learning based
trackers that use binary classification for region candidates, we use
regression for direct prediction of the tracking locations both at the
convolutional layer and at the recurrent unit. Our extensive experimental
results and performance comparison with state-of-the-art tracking methods on
challenging benchmark video tracking datasets shows that our tracker is more
accurate and robust while maintaining low computational cost. For most test
video sequences, our method achieves the best tracking performance, often
outperforms the second best by a large margin.Comment: 10 pages, 9 figures, conferenc
Differentiating Objects by Motion: Joint Detection and Tracking of Small Flying Objects
While generic object detection has achieved large improvements with rich
feature hierarchies from deep nets, detecting small objects with poor visual
cues remains challenging. Motion cues from multiple frames may be more
informative for detecting such hard-to-distinguish objects in each frame.
However, how to encode discriminative motion patterns, such as deformations and
pose changes that characterize objects, has remained an open question. To learn
them and thereby realize small object detection, we present a neural model
called the Recurrent Correlational Network, where detection and tracking are
jointly performed over a multi-frame representation learned through a single,
trainable, and end-to-end network. A convolutional long short-term memory
network is utilized for learning informative appearance change for detection,
while learned representation is shared in tracking for enhancing its
performance. In experiments with datasets containing images of scenes with
small flying objects, such as birds and unmanned aerial vehicles, the proposed
method yielded consistent improvements in detection performance over deep
single-frame detectors and existing motion-based detectors. Furthermore, our
network performs as well as state-of-the-art generic object trackers when it
was evaluated as a tracker on the bird dataset.Comment: 10 pages, 8 figure
Perceiving and Reasoning About Liquids Using Fully Convolutional Networks
Liquids are an important part of many common manipulation tasks in human
environments. If we wish to have robots that can accomplish these types of
tasks, they must be able to interact with liquids in an intelligent manner. In
this paper, we investigate ways for robots to perceive and reason about
liquids. That is, a robot asks the questions What in the visual data stream is
liquid? and How can I use that to infer all the potential places where liquid
might be? We collected two datasets to evaluate these questions, one using a
realistic liquid simulator and another on our robot. We used fully
convolutional neural networks to learn to detect and track liquids across
pouring sequences. Our results show that these networks are able to perceive
and reason about liquids, and that integrating temporal information is
important to performing such tasks well.Comment: In The International Journal of Robotics Research (to appear
- …