13,009 research outputs found
Deep Depth Completion of a Single RGB-D Image
The goal of our work is to complete the depth channel of an RGB-D image.
Commodity-grade depth cameras often fail to sense depth for shiny, bright,
transparent, and distant surfaces. To address this problem, we train a deep
network that takes an RGB image as input and predicts dense surface normals and
occlusion boundaries. Those predictions are then combined with raw depth
observations provided by the RGB-D camera to solve for depths for all pixels,
including those missing in the original observation. This method was chosen
over others (e.g., inpainting depths directly) as the result of extensive
experiments with a new depth completion benchmark dataset, where holes are
filled in training data through the rendering of surface reconstructions
created from multiview RGB-D scans. Experiments with different network inputs,
depth representations, loss functions, optimization methods, inpainting
methods, and deep depth estimation networks show that our proposed approach
provides better depth completions than these alternatives.Comment: Accepted by CVPR2018 (Spotlight). Project webpage:
http://deepcompletion.cs.princeton.edu/ This version includes supplementary
materials which provide more implementation details, quantitative evaluation,
and qualitative results. Due to file size limit, please check project website
for high-res pape
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing
Free-viewpoint video conferencing allows a participant to observe the remote
3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint
image is commonly synthesized using two pairs of transmitted texture and depth
maps from two neighboring captured viewpoints via depth-image-based rendering
(DIBR). To maintain high quality of synthesized images, it is imperative to
contain the adverse effects of network packet losses that may arise during
texture and depth video transmission. Towards this end, we develop an
integrated approach that exploits the representation redundancy inherent in the
multiple streamed videos a voxel in the 3D scene visible to two captured views
is sampled and coded twice in the two views. In particular, at the receiver we
first develop an error concealment strategy that adaptively blends
corresponding pixels in the two captured views during DIBR, so that pixels from
the more reliable transmitted view are weighted more heavily. We then couple it
with a sender-side optimization of reference picture selection (RPS) during
real-time video coding, so that blocks containing samples of voxels that are
visible in both views are more error-resiliently coded in one view only, given
adaptive blending will erase errors in the other view. Further, synthesized
view distortion sensitivities to texture versus depth errors are analyzed, so
that relative importance of texture and depth code blocks can be computed for
system-wide RPS optimization. Experimental results show that the proposed
scheme can outperform the use of a traditional feedback channel by up to 0.82
dB on average at 8% packet loss rate, and by as much as 3 dB for particular
frames
- …