45,325 research outputs found
Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving
We propose a stereo vision-based approach for tracking the camera ego-motion
and 3D semantic objects in dynamic autonomous driving scenarios. Instead of
directly regressing the 3D bounding box using end-to-end approaches, we propose
to use the easy-to-labeled 2D detection and discrete viewpoint classification
together with a light-weight semantic inference method to obtain rough 3D
object measurements. Based on the object-aware-aided camera pose tracking which
is robust in dynamic environments, in combination with our novel dynamic object
bundle adjustment (BA) approach to fuse temporal sparse feature correspondences
and the semantic 3D measurement model, we obtain 3D object pose, velocity and
anchored dynamic point cloud estimation with instance accuracy and temporal
consistency. The performance of our proposed method is demonstrated in diverse
scenarios. Both the ego-motion estimation and object localization are compared
with the state-of-of-the-art solutions.Comment: 14 pages, 9 figures, eccv201
Unsupervised Object Discovery and Tracking in Video Collections
This paper addresses the problem of automatically localizing dominant objects
as spatio-temporal tubes in a noisy collection of videos with minimal or even
no supervision. We formulate the problem as a combination of two complementary
processes: discovery and tracking. The first one establishes correspondences
between prominent regions across videos, and the second one associates
successive similar object regions within the same video. Interestingly, our
algorithm also discovers the implicit topology of frames associated with
instances of the same object class across different videos, a role normally
left to supervisory information in the form of class labels in conventional
image and video understanding methods. Indeed, as demonstrated by our
experiments, our method can handle video collections featuring multiple object
classes, and substantially outperforms the state of the art in colocalization,
even though it tackles a broader problem with much less supervision
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking
With efficient appearance learning models, Discriminative Correlation Filter
(DCF) has been proven to be very successful in recent video object tracking
benchmarks and competitions. However, the existing DCF paradigm suffers from
two major issues, i.e., spatial boundary effect and temporal filter
degradation. To mitigate these challenges, we propose a new DCF-based tracking
method. The key innovations of the proposed method include adaptive spatial
feature selection and temporal consistent constraints, with which the new
tracker enables joint spatial-temporal filter learning in a lower dimensional
discriminative manifold. More specifically, we apply structured spatial
sparsity constraints to multi-channel filers. Consequently, the process of
learning spatial filters can be approximated by the lasso regularisation. To
encourage temporal consistency, the filter model is restricted to lie around
its historical value and updated locally to preserve the global structure in
the manifold. Last, a unified optimisation framework is proposed to jointly
select temporal consistency preserving spatial features and learn
discriminative filters with the augmented Lagrangian method. Qualitative and
quantitative evaluations have been conducted on a number of well-known
benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and
VOT2018. The experimental results demonstrate the superiority of the proposed
method over the state-of-the-art approaches
DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments
Simultaneous Localization and Mapping (SLAM) is considered to be a
fundamental capability for intelligent mobile robots. Over the past decades,
many impressed SLAM systems have been developed and achieved good performance
under certain circumstances. However, some problems are still not well solved,
for example, how to tackle the moving objects in the dynamic environments, how
to make the robots truly understand the surroundings and accomplish advanced
tasks. In this paper, a robust semantic visual SLAM towards dynamic
environments named DS-SLAM is proposed. Five threads run in parallel in
DS-SLAM: tracking, semantic segmentation, local mapping, loop closing, and
dense semantic map creation. DS-SLAM combines semantic segmentation network
with moving consistency check method to reduce the impact of dynamic objects,
and thus the localization accuracy is highly improved in dynamic environments.
Meanwhile, a dense semantic octo-tree map is produced, which could be employed
for high-level tasks. We conduct experiments both on TUM RGB-D dataset and in
the real-world environment. The results demonstrate the absolute trajectory
accuracy in DS-SLAM can be improved by one order of magnitude compared with
ORB-SLAM2. It is one of the state-of-the-art SLAM systems in high-dynamic
environments. Now the code is available at our github:
https://github.com/ivipsourcecode/DS-SLAMComment: 7 pages, accepted at the 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2018). Now the code is available at our
github: https://github.com/ivipsourcecode/DS-SLA
GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB
We address the highly challenging problem of real-time 3D hand tracking based
on a monocular RGB-only sequence. Our tracking method combines a convolutional
neural network with a kinematic 3D hand model, such that it generalizes well to
unseen data, is robust to occlusions and varying camera viewpoints, and leads
to anatomically plausible as well as temporally smooth hand motions. For
training our CNN we propose a novel approach for the synthetic generation of
training data that is based on a geometrically consistent image-to-image
translation network. To be more specific, we use a neural network that
translates synthetic images to "real" images, such that the so-generated images
follow the same statistical distribution as real-world hand images. For
training this translation network we combine an adversarial loss and a
cycle-consistency loss with a geometric consistency loss in order to preserve
geometric properties (such as hand pose) during translation. We demonstrate
that our hand tracking system outperforms the current state-of-the-art on
challenging RGB-only footage
- …