271 research outputs found
Depth Fields: Extending Light Field Techniques to Time-of-Flight Imaging
A variety of techniques such as light field, structured illumination, and
time-of-flight (TOF) are commonly used for depth acquisition in consumer
imaging, robotics and many other applications. Unfortunately, each technique
suffers from its individual limitations preventing robust depth sensing. In
this paper, we explore the strengths and weaknesses of combining light field
and time-of-flight imaging, particularly the feasibility of an on-chip
implementation as a single hybrid depth sensor. We refer to this combination as
depth field imaging. Depth fields combine light field advantages such as
synthetic aperture refocusing with TOF imaging advantages such as high depth
resolution and coded signal processing to resolve multipath interference. We
show applications including synthesizing virtual apertures for TOF imaging,
improved depth mapping through partial and scattering occluders, and single
frequency TOF phase unwrapping. Utilizing space, angle, and temporal coding,
depth fields can improve depth sensing in the wild and generate new insights
into the dimensions of light's plenoptic function.Comment: 9 pages, 8 figures, Accepted to 3DV 201
Three-Dimensional Integral Imaging for Gesture Recognition Under Occlusions
Over the last years, three-dimensional (3-D) imaging has been applied to human action and gesture recognition, usually in the form of depth maps from RGB-D sensors. An alternative which has not been explored is 3-D integral imaging, aside from a recent preliminary study which shows that it can be an effective sensory modality with some advantages over the conventional monocular imaging. Since integral imaging has also been shown to be a powerful tool in other visual tasks (e.g., object reconstruction and recognition) under challenging conditions (e.g., low illumination, occlusions), and its passive long-range operation brings benefits over active close-range devices, a natural question is whether these advantages also hold for gesture recognition. Furthermore, occlusions are present in many real-world scenarios in gesture recognition, but it is an elusive problem which has scarcely been addressed. As far as we know, this letter analyzes for the first time the potential of integral imaging for gesture recognition under occlusions, by comparing it to monocular imaging and to RGB-D sensory data. Empirical results corroborates the benefits of 3-D integral imaging for gesture recognition, mainly under occlusions
Synthetic Aperture Anomaly Imaging
Previous research has shown that in the presence of foliage occlusion,
anomaly detection performs significantly better in integral images resulting
from synthetic aperture imaging compared to applying it to conventional aerial
images. In this article, we hypothesize and demonstrate that integrating
detected anomalies is even more effective than detecting anomalies in
integrals. This results in enhanced occlusion removal, outlier suppression, and
higher chances of visually as well as computationally detecting targets that
are otherwise occluded. Our hypothesis was validated through both: simulations
and field experiments. We also present a real-time application that makes our
findings practically available for blue-light organizations and others using
commercial drone platforms. It is designed to address use-cases that suffer
from strong occlusion caused by vegetation, such as search and rescue, wildlife
observation, early wildfire detection, and sur-veillance
Stereoscopic Depth Perception Through Foliage
Both humans and computational methods struggle to discriminate the depths of
objects hidden beneath foliage. However, such discrimination becomes feasible
when we combine computational optical synthetic aperture sensing with the human
ability to fuse stereoscopic images. For object identification tasks, as
required in search and rescue, wildlife observation, surveillance, and early
wildfire detection, depth assists in differentiating true from false findings,
such as people, animals, or vehicles vs. sun-heated patches at the ground level
or in the tree crowns, or ground fires vs. tree trunks. We used video captured
by a drone above dense woodland to test users' ability to discriminate depth.
We found that this is impossible when viewing monoscopic video and relying on
motion parallax. The same was true with stereoscopic video because of the
occlusions caused by foliage. However, when synthetic aperture sensing was used
to reduce occlusions and disparity-scaled stereoscopic video was presented,
whereas computational (stereoscopic matching) methods were unsuccessful, human
observers successfully discriminated depth. This shows the potential of systems
which exploit the synergy between computational methods and human vision to
perform tasks that neither can perform alone
Aperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate
scene depths from a single image, by using the information provided by a
camera's aperture as supervision. Prior works use a depth sensor's outputs or
images of the same scene from alternate viewpoints as supervision, while our
method instead uses images from the same viewpoint taken with a varying camera
aperture. To enable learning algorithms to use aperture effects as supervision,
we introduce two differentiable aperture rendering functions that use the input
image and predicted depths to simulate the depth-of-field effects caused by
real camera apertures. We train a monocular depth estimation network end-to-end
to predict the scene depths that best explain these finite aperture images as
defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version
Occlusion-Aware Multi-View Reconstruction of Articulated Objects for Manipulation
The goal of this research is to develop algorithms using multiple views to automatically recover complete 3D models of articulated objects in unstructured environments and thereby enable a robotic system to facilitate further manipulation of those objects. First, an algorithm called Procrustes-Lo-RANSAC (PLR) is presented. Structure-from-motion techniques are used to capture 3D point cloud models of an articulated object in two different configurations. Procrustes analysis, combined with a locally optimized RANSAC sampling strategy, facilitates a straightforward geometric approach to recovering the joint axes, as well as classifying them automatically as either revolute or prismatic. The algorithm does not require prior knowledge of the object, nor does it make any assumptions about the planarity of the object or scene. Second, with such a resulting articulated model, a robotic system is then able to manipulate the object either along its joint axes at a specified grasp point in order to exercise its degrees of freedom or move its end effector to a particular position even if the point is not visible in the current view. This is one of the main advantages of the occlusion-aware approach, because the models capture all sides of the object meaning that the robot has knowledge of parts of the object that are not visible in the current view. Experiments with a PUMA 500 robotic arm demonstrate the effectiveness of the approach on a variety of real-world objects containing both revolute and prismatic joints. Third, we improve the proposed approach by using a RGBD sensor (Microsoft Kinect) that yield a depth value for each pixel immediately by the sensor itself rather than requiring correspondence to establish depth. KinectFusion algorithm is applied to produce a single high-quality, geometrically accurate 3D model from which rigid links of the object are segmented and aligned, allowing the joint axes to be estimated using the geometric approach. The improved algorithm does not require artificial markers attached to objects, yields much denser 3D models and reduces the computation time
EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images
Light field cameras capture both the spatial and the angular properties of
light rays in space. Due to its property, one can compute the depth from light
fields in uncontrolled lighting environments, which is a big advantage over
active sensing devices. Depth computed from light fields can be used for many
applications including 3D modelling and refocusing. However, light field images
from hand-held cameras have very narrow baselines with noise, making the depth
estimation difficult. any approaches have been proposed to overcome these
limitations for the light field depth estimation, but there is a clear
trade-off between the accuracy and the speed in these methods. In this paper,
we introduce a fast and accurate light field depth estimation method based on a
fully-convolutional neural network. Our network is designed by considering the
light field geometry and we also overcome the lack of training data by
proposing light field specific data augmentation methods. We achieved the top
rank in the HCI 4D Light Field Benchmark on most metrics, and we also
demonstrate the effectiveness of the proposed method on real-world light-field
images.Comment: Accepted to CVPR 2018, Total 10 page
Review of constraints on vision-based gesture recognition for human–computer interaction
The ability of computers to recognise hand gestures visually is essential for progress in human-computer interaction. Gesture recognition has applications ranging from sign language to medical assistance to virtual reality. However, gesture recognition is extremely challenging not only because of its diverse contexts, multiple interpretations, and spatio-temporal variations but also because of the complex non-rigid properties of the hand. This study surveys major constraints on vision-based gesture recognition occurring in detection and pre-processing, representation and feature extraction, and recognition. Current challenges are explored in detail
- …