1,350 research outputs found
Combined Image- and World-Space Tracking in Traffic Scenes
Tracking in urban street scenes plays a central role in autonomous systems
such as self-driving cars. Most of the current vision-based tracking methods
perform tracking in the image domain. Other approaches, eg based on LIDAR and
radar, track purely in 3D. While some vision-based tracking methods invoke 3D
information in parts of their pipeline, and some 3D-based methods utilize
image-based information in components of their approach, we propose to use
image- and world-space information jointly throughout our method. We present
our tracking pipeline as a 3D extension of image-based tracking. From enhancing
the detections with 3D measurements to the reported positions of every tracked
object, we use world-space 3D information at every stage of processing. We
accomplish this by our novel coupled 2D-3D Kalman filter, combined with a
conceptually clean and extendable hypothesize-and-select framework. Our
approach matches the current state-of-the-art on the official KITTI benchmark,
which performs evaluation in the 2D image domain only. Further experiments show
significant improvements in 3D localization precision by enabling our coupled
2D-3D tracking.Comment: 8 pages, 7 figures, 2 tables. ICRA 2017 pape
Real-time Prediction of Automotive Collision Risk from Monocular Video
Many automotive applications, such as Advanced Driver Assistance Systems
(ADAS) for collision avoidance and warnings, require estimating the future
automotive risk of a driving scene. We present a low-cost system that predicts
the collision risk over an intermediate time horizon from a monocular video
source, such as a dashboard-mounted camera. The modular system includes
components for object detection, object tracking, and state estimation. We
introduce solutions to the object tracking and distance estimation problems.
Advanced approaches to the other tasks are used to produce real-time
predictions of the automotive risk for the next 10 s at over 5 Hz. The system
is designed such that alternative components can be substituted with minimal
effort. It is demonstrated on common physical hardware, specifically an
off-the-shelf gaming laptop and a webcam. We extend the framework to support
absolute speed estimation and more advanced risk estimation techniques.Comment: Submitted to IV2019. 7 pages, 4 figures, 3 table
Saliency Driven Object recognition in egocentric videos with deep CNN
The problem of object recognition in natural scenes has been recently
successfully addressed with Deep Convolutional Neuronal Networks giving a
significant break-through in recognition scores. The computational efficiency
of Deep CNNs as a function of their depth, allows for their use in real-time
applications. One of the key issues here is to reduce the number of windows
selected from images to be submitted to a Deep CNN. This is usually solved by
preliminary segmentation and selection of specific windows, having outstanding
"objectiveness" or other value of indicators of possible location of objects.
In this paper we propose a Deep CNN approach and the general framework for
recognition of objects in a real-time scenario and in an egocentric
perspective. Here the window of interest is built on the basis of visual
attention map computed over gaze fixations measured by a glass-worn
eye-tracker. The application of this set-up is an interactive user-friendly
environment for upper-limb amputees. Vision has to help the subject to control
his worn neuro-prosthesis in case of a small amount of remaining muscles when
the EMG control becomes unefficient. The recognition results on a specifically
recorded corpus of 151 videos with simple geometrical objects show the mAP of
64,6\% and the computational time at the generalization lower than a time of a
visual fixation on the object-of-interest.Comment: 20 pages, 8 figures, 3 tables, Submitted to the Journal of Computer
Vision and Image Understandin
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
This survey presents a deep analysis of the learning and inference
capabilities in nine popular trackers. It is neither intended to study the
whole literature nor is it an attempt to review all kinds of neural networks
proposed for visual tracking. We focus instead on Siamese neural networks which
are a promising starting point for studying the challenging problem of
tracking. These networks integrate efficiently feature learning and the
temporal matching and have so far shown state-of-the-art performance. In
particular, the branches of Siamese networks, their layers connecting these
branches, specific aspects of training and the embedding of these networks into
the tracker are highlighted. Quantitative results from existing papers are
compared with the conclusion that the current evaluation methodology shows
problems with the reproducibility and the comparability of results. The paper
proposes a novel Lisp-like formalism for a better comparison of trackers. This
assumes a certain functional design and functional decomposition of trackers.
The paper tries to give foundation for tracker design by a formulation of the
problem based on the theory of machine learning and by the interpretation of a
tracker as a decision function. The work concludes with promising lines of
research and suggests future work.Comment: submitted to IEEE TPAM
A Twofold Siamese Network for Real-Time Object Tracking
Observing that Semantic features learned in an image classification task and
Appearance features learned in a similarity matching task complement each
other, we build a twofold Siamese network, named SA-Siam, for real-time object
tracking. SA-Siam is composed of a semantic branch and an appearance branch.
Each branch is a similarity-learning Siamese network. An important design
choice in SA-Siam is to separately train the two branches to keep the
heterogeneity of the two types of features. In addition, we propose a channel
attention mechanism for the semantic branch. Channel-wise weights are computed
according to the channel activations around the target position. While the
inherited architecture from SiamFC \cite{SiamFC} allows our tracker to operate
beyond real-time, the twofold design and the attention mechanism significantly
improve the tracking performance. The proposed SA-Siam outperforms all other
real-time trackers by a large margin on OTB-2013/50/100 benchmarks.Comment: Accepted by CVPR'1
Tracking with multi-level features
We present a novel formulation of the multiple object tracking problem which
integrates low and mid-level features. In particular, we formulate the tracking
problem as a quadratic program coupling detections and dense point
trajectories. Due to the computational complexity of the initial QP, we propose
an approximation by two auxiliary problems, a temporal and spatial association,
where the temporal subproblem can be efficiently solved by a linear program and
the spatial association by a clustering algorithm. The objective function of
the QP is used in order to find the optimal number of clusters, where each
cluster ideally represents one person. Evaluation is provided for multiple
scenarios, showing the superiority of our method with respect to classic
tracking-by-detection methods and also other methods that greedily integrate
low-level features.Comment: Submitted as an IEEE PAMI short articl
DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking
Deep neural networks, albeit their great success on feature learning in
various computer vision tasks, are usually considered as impractical for online
visual tracking because they require very long training time and a large number
of training samples. In this work, we present an efficient and very robust
tracking algorithm using a single Convolutional Neural Network (CNN) for
learning effective feature representations of the target object, in a purely
online manner. Our contributions are multifold: First, we introduce a novel
truncated structural loss function that maintains as many training samples as
possible and reduces the risk of tracking error accumulation. Second, we
enhance the ordinary Stochastic Gradient Descent approach in CNN training with
a robust sample selection mechanism. The sampling mechanism randomly generates
positive and negative samples from different temporal distributions, which are
generated by taking the temporal relations and label noise into account.
Finally, a lazy yet effective updating scheme is designed for CNN training.
Equipped with this novel updating algorithm, the CNN model is robust to some
long-existing difficulties in visual tracking such as occlusion or incorrect
detections, without loss of the effective adaption for significant appearance
changes. In the experiment, our CNN tracker outperforms all compared
state-of-the-art methods on two recently proposed benchmarks which in total
involve over 60 video sequences. The remarkable performance improvement over
the existing trackers illustrates the superiority of the feature
representations which are learnedComment: 12 page
Leveraging Shape Completion for 3D Siamese Tracking
Point clouds are challenging to process due to their sparsity, therefore
autonomous vehicles rely more on appearance attributes than pure geometric
features. However, 3D LIDAR perception can provide crucial information for
urban navigation in challenging light or weather conditions. In this paper, we
investigate the versatility of Shape Completion for 3D Object Tracking in LIDAR
point clouds. We design a Siamese tracker that encodes model and candidate
shapes into a compact latent representation. We regularize the encoding by
enforcing the latent representation to decode into an object model shape. We
observe that 3D object tracking and 3D shape completion complement each other.
Learning a more meaningful latent representation shows better discriminatory
capabilities, leading to improved tracking performance. We test our method on
the KITTI Tracking set using car 3D bounding boxes. Our model reaches a 76.94%
Success rate and 81.38% Precision for 3D Object Tracking, with the shape
completion regularization leading to an improvement of 3% in both metrics.Comment: Accepted in CVPR1
Deformable Parts Correlation Filters for Robust Visual Tracking
Deformable parts models show a great potential in tracking by principally
addressing non-rigid object deformations and self occlusions, but according to
recent benchmarks, they often lag behind the holistic approaches. The reason is
that potentially large number of degrees of freedom have to be estimated for
object localization and simplifications of the constellation topology are often
assumed to make the inference tractable. We present a new formulation of the
constellation model with correlation filters that treats the geometric and
visual constraints within a single convex cost function and derive a highly
efficient optimization for MAP inference of a fully-connected constellation. We
propose a tracker that models the object at two levels of detail. The coarse
level corresponds a root correlation filter and a novel color model for
approximate object localization, while the mid-level representation is composed
of the new deformable constellation of correlation filters that refine the
object location. The resulting tracker is rigorously analyzed on a highly
challenging OTB, VOT2014 and VOT2015 benchmarks, exhibits a state-of-the-art
performance and runs in real-time.Comment: 14 pages, first submission to jurnal: 9.11.2015, re-submission on
11.5.201
COMET: Context-Aware IoU-Guided Network for Small Object Tracking
We consider the problem of tracking an unknown small target from aerial
videos of medium to high altitudes. This is a challenging problem, which is
even more pronounced in unavoidable scenarios of drastic camera motion and high
density. To address this problem, we introduce a context-aware IoU-guided
tracker (COMET) that exploits a multitask two-stream network and an offline
reference proposal generation strategy. The proposed network fully exploits
target-related information by multi-scale feature learning and attention
modules. The proposed strategy introduces an efficient sampling strategy to
generalize the network on the target and its parts without imposing extra
computational complexity during online tracking. These strategies contribute
considerably in handling significant occlusions and viewpoint changes.
Empirically, COMET outperforms the state-of-the-arts in a range of aerial view
datasets that focusing on tracking small objects. Specifically, COMET
outperforms the celebrated ATOM tracker by an average margin of 6.2% (and 7%)
in precision (and success) score on challenging benchmarks of UAVDT,
VisDrone-2019, and Small-90.Comment: Accepted manuscript in ACCV 202
- …