16,602 research outputs found
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Low Power Depth Estimation of Rigid Objects for Time-of-Flight Imaging
Depth sensing is useful in a variety of applications that range from
augmented reality to robotics. Time-of-flight (TOF) cameras are appealing
because they obtain dense depth measurements with minimal latency. However, for
many battery-powered devices, the illumination source of a TOF camera is power
hungry and can limit the battery life of the device. To address this issue, we
present an algorithm that lowers the power for depth sensing by reducing the
usage of the TOF camera and estimating depth maps using concurrently collected
images. Our technique also adaptively controls the TOF camera and enables it
when an accurate depth map cannot be estimated. To ensure that the overall
system power for depth sensing is reduced, we design our algorithm to run on a
low power embedded platform, where it outputs 640x480 depth maps at 30 frames
per second. We evaluate our approach on several RGB-D datasets, where it
produces depth maps with an overall mean relative error of 0.96% and reduces
the usage of the TOF camera by 85%. When used with commercial TOF cameras, we
estimate that our algorithm can lower the total power for depth sensing by up
to 73%
SPLODE: Semi-Probabilistic Point and Line Odometry with Depth Estimation from RGB-D Camera Motion
Active depth cameras suffer from several limitations, which cause incomplete
and noisy depth maps, and may consequently affect the performance of RGB-D
Odometry. To address this issue, this paper presents a visual odometry method
based on point and line features that leverages both measurements from a depth
sensor and depth estimates from camera motion. Depth estimates are generated
continuously by a probabilistic depth estimation framework for both types of
features to compensate for the lack of depth measurements and inaccurate
feature depth associations. The framework models explicitly the uncertainty of
triangulating depth from both point and line observations to validate and
obtain precise estimates. Furthermore, depth measurements are exploited by
propagating them through a depth map registration module and using a
frame-to-frame motion estimation method that considers 3D-to-2D and 2D-to-3D
reprojection errors, independently. Results on RGB-D sequences captured on
large indoor and outdoor scenes, where depth sensor limitations are critical,
show that the combination of depth measurements and estimates through our
approach is able to overcome the absence and inaccuracy of depth measurements.Comment: IROS 201
Video Frame Interpolation via Adaptive Separable Convolution
Standard video frame interpolation methods first estimate optical flow
between input frames and then synthesize an intermediate frame guided by
motion. Recent approaches merge these two steps into a single convolution
process by convolving input frames with spatially adaptive kernels that account
for motion and re-sampling simultaneously. These methods require large kernels
to handle large motion, which limits the number of pixels whose kernels can be
estimated at once due to the large memory demand. To address this problem, this
paper formulates frame interpolation as local separable convolution over input
frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D
kernels require significantly fewer parameters to be estimated. Our method
develops a deep fully convolutional neural network that takes two input frames
and estimates pairs of 1D kernels for all pixels simultaneously. Since our
method is able to estimate kernels and synthesizes the whole video frame at
once, it allows for the incorporation of perceptual loss to train the neural
network to produce visually pleasing frames. This deep neural network is
trained end-to-end using widely available video data without any human
annotation. Both qualitative and quantitative experiments show that our method
provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
- …