4,750 research outputs found
Occlusion-Robust MVO: Multimotion Estimation Through Occlusion Via Motion Closure
Visual motion estimation is an integral and well-studied challenge in
autonomous navigation. Recent work has focused on addressing multimotion
estimation, which is especially challenging in highly dynamic environments.
Such environments not only comprise multiple, complex motions but also tend to
exhibit significant occlusion.
Previous work in object tracking focuses on maintaining the integrity of
object tracks but usually relies on specific appearance-based descriptors or
constrained motion models. These approaches are very effective in specific
applications but do not generalize to the full multimotion estimation problem.
This paper presents a pipeline for estimating multiple motions, including the
camera egomotion, in the presence of occlusions. This approach uses an
expressive motion prior to estimate the SE (3) trajectory of every motion in
the scene, even during temporary occlusions, and identify the reappearance of
motions through motion closure. The performance of this occlusion-robust
multimotion visual odometry (MVO) pipeline is evaluated on real-world data and
the Oxford Multimotion Dataset.Comment: To appear at the 2020 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS). An earlier version of this work first
appeared at the Long-term Human Motion Planning Workshop (ICRA 2019). 8
pages, 5 figures. Video available at
https://www.youtube.com/watch?v=o_N71AA6FR
Real-time detection and tracking of multiple objects with partial decoding in H.264/AVC bitstream domain
In this paper, we show that we can apply probabilistic spatiotemporal
macroblock filtering (PSMF) and partial decoding processes to effectively
detect and track multiple objects in real time in H.264|AVC bitstreams with
stationary background. Our contribution is that our method cannot only show
fast processing time but also handle multiple moving objects that are
articulated, changing in size or internally have monotonous color, even though
they contain a chaotic set of non-homogeneous motion vectors inside. In
addition, our partial decoding process for H.264|AVC bitstreams enables to
improve the accuracy of object trajectories and overcome long occlusion by
using extracted color information.Comment: SPIE Real-Time Image and Video Processing Conference 200
Real-Time Seamless Single Shot 6D Object Pose Prediction
We propose a single-shot approach for simultaneously detecting an object in
an RGB image and predicting its 6D pose without requiring multiple stages or
having to examine multiple hypotheses. Unlike a recently proposed single-shot
technique for this task (Kehl et al., ICCV'17) that only predicts an
approximate 6D pose that must then be refined, ours is accurate enough not to
require additional post-processing. As a result, it is much faster - 50 fps on
a Titan X (Pascal) GPU - and more suitable for real-time processing. The key
component of our method is a new CNN architecture inspired by the YOLO network
design that directly predicts the 2D image locations of the projected vertices
of the object's 3D bounding box. The object's 6D pose is then estimated using a
PnP algorithm.
For single object and multiple object pose estimation on the LINEMOD and
OCCLUSION datasets, our approach substantially outperforms other recent
CNN-based approaches when they are all used without post-processing. During
post-processing, a pose refinement step can be used to boost the accuracy of
the existing methods, but at 10 fps or less, they are much slower than our
method.Comment: CVPR 201
Observation-switching linear dynamic systems for tracking humans through unexpected partial occlusions by scene objects
This paper focuses on the problem of tracking people through occlusions by scene objects. Rather than relying on models of the scene to predict when occlusions will occur as other researchers have done, this paper proposes a linear dynamic system that switches between two alternatives of the position measurement in order to handle occlusions as they occur. The filter automatically switches between a foot-based measure of position (assuming z = Q) to a head-based position measure (given the person\u27s height) when an occlusion of the person\u27s lower body occurs. No knowledge of the scene or its occluding objects is used. Unlike similar research [2, 14], the approach does not assume a fixed height for people and so is able to track humans through occlusions even when they change height during the occlusion. The approach is evaluated on three furnished scenes containing tables, chairs, desks and partitions. Occlusions range from occlusions of legs, occlusions whilst being seated and near-total occlusions where only the person\u27s head is visible. Results show that the approach provides a significant reduction in false-positive tracks in a multi-camera environment, and more than halves the number of lost tracks in single monocular camera views
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
- …