3,546 research outputs found
Improved Depth Map Estimation from Stereo Images based on Hybrid Method
In this paper, a stereo matching algorithm based on image segments is presented. We propose the hybrid segmentation algorithm that is based on a combination of the Belief Propagation and Mean Shift algorithms with aim to refine the disparity and depth map by using a stereo pair of images. This algorithm utilizes image filtering and modified SAD (Sum of Absolute Differences) stereo matching method. Firstly, a color based segmentation method is applied for segmenting the left image of the input stereo pair (reference image) into regions. The aim of the segmentation is to simplify representation of the image into the form that is easier to analyze and is able to locate objects in images. Secondly, results of the segmentation are used as an input of the local window-based matching method to determine the disparity estimate of each image pixel. The obtained experimental results demonstrate that the final depth map can be obtained by application of segment disparities to the original images. Experimental results with the stereo testing images show that our proposed Hybrid algorithm HSAD gives a good performance
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems
In this paper we present ActiveStereoNet, the first deep learning solution
for active stereo systems. Due to the lack of ground truth, our method is fully
self-supervised, yet it produces precise depth with a subpixel precision of
of a pixel; it does not suffer from the common over-smoothing issues;
it preserves the edges; and it explicitly handles occlusions. We introduce a
novel reconstruction loss that is more robust to noise and texture-less
patches, and is invariant to illumination changes. The proposed loss is
optimized using a window-based cost aggregation with an adaptive support weight
scheme. This cost aggregation is edge-preserving and smooths the loss function,
which is key to allow the network to reach compelling results. Finally we show
how the task of predicting invalid regions, such as occlusions, can be trained
end-to-end without ground-truth. This component is crucial to reduce blur and
particularly improves predictions along depth discontinuities. Extensive
quantitatively and qualitatively evaluations on real and synthetic data
demonstrate state of the art results in many challenging scenes.Comment: Accepted by ECCV2018, Oral Presentation, Main paper + Supplementary
Material
A graphical model based solution to the facial feature point tracking problem
In this paper a facial feature point tracker that is motivated by applications
such as human-computer interfaces and facial expression analysis systems is
proposed. The proposed tracker is based on a graphical model framework. The
facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated
on real video data under various conditions including occluded facial gestures
and head movements. It is also compared to two popular methods, one based
on Kalman filtering exploiting temporal relations, and the other based on active
appearance models (AAM). Improvements provided by the proposed approach
are demonstrated through both visual displays and quantitative analysis
Fast Multi-frame Stereo Scene Flow with Motion Segmentation
We propose a new multi-frame method for efficiently computing scene flow
(dense depth and optical flow) and camera ego-motion for a dynamic scene
observed from a moving stereo camera rig. Our technique also segments out
moving objects from the rigid scene. In our method, we first estimate the
disparity map and the 6-DOF camera motion using stereo matching and visual
odometry. We then identify regions inconsistent with the estimated camera
motion and compute per-pixel optical flow only at these regions. This flow
proposal is fused with the camera motion-based flow proposal using fusion moves
to obtain the final optical flow and motion segmentation. This unified
framework benefits all four tasks - stereo, optical flow, visual odometry and
motion segmentation leading to overall higher accuracy and efficiency. Our
method is currently ranked third on the KITTI 2015 scene flow benchmark.
Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3
orders of magnitude faster than the top six methods. We also report a thorough
evaluation on challenging Sintel sequences with fast camera and object motion,
where our method consistently outperforms OSF [Menze and Geiger, 2015], which
is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo
Scene Flow Benchmark in November 201
General Dynamic Scene Reconstruction from Multiple View Video
This paper introduces a general approach to dynamic scene reconstruction from
multiple moving cameras without prior knowledge or limiting constraints on the
scene structure, appearance, or illumination. Existing techniques for dynamic
scene reconstruction from multiple wide-baseline camera views primarily focus
on accurate reconstruction in controlled environments, where the cameras are
fixed and calibrated and background is known. These approaches are not robust
for general dynamic scenes captured with sparse moving cameras. Previous
approaches for outdoor dynamic scene reconstruction assume prior knowledge of
the static background appearance and structure. The primary contributions of
this paper are twofold: an automatic method for initial coarse dynamic scene
segmentation and reconstruction without prior knowledge of background
appearance or structure; and a general robust approach for joint segmentation
refinement and dense reconstruction of dynamic scenes from multiple
wide-baseline static or moving cameras. Evaluation is performed on a variety of
indoor and outdoor scenes with cluttered backgrounds and multiple dynamic
non-rigid objects such as people. Comparison with state-of-the-art approaches
demonstrates improved accuracy in both multiple view segmentation and dense
reconstruction. The proposed approach also eliminates the requirement for prior
knowledge of scene structure and appearance
Accurate Optical Flow via Direct Cost Volume Processing
We present an optical flow estimation approach that operates on the full
four-dimensional cost volume. This direct approach shares the structural
benefits of leading stereo matching pipelines, which are known to yield high
accuracy. To this day, such approaches have been considered impractical due to
the size of the cost volume. We show that the full four-dimensional cost volume
can be constructed in a fraction of a second due to its regularity. We then
exploit this regularity further by adapting semi-global matching to the
four-dimensional setting. This yields a pipeline that achieves significantly
higher accuracy than state-of-the-art optical flow methods while being faster
than most. Our approach outperforms all published general-purpose optical flow
methods on both Sintel and KITTI 2015 benchmarks.Comment: Published at the Conference on Computer Vision and Pattern
Recognition (CVPR 2017
SIFT Flow: Dense Correspondence across Scenes and its Applications
While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition
- …