5,555 research outputs found
Photometric Depth Super-Resolution
This study explores the use of photometric techniques (shape-from-shading and
uncalibrated photometric stereo) for upsampling the low-resolution depth map
from an RGB-D sensor to the higher resolution of the companion RGB image. A
single-shot variational approach is first put forward, which is effective as
long as the target's reflectance is piecewise-constant. It is then shown that
this dependency upon a specific reflectance model can be relaxed by focusing on
a specific class of objects (e.g., faces), and delegate reflectance estimation
to a deep neural network. A multi-shot strategy based on randomly varying
lighting conditions is eventually discussed. It requires no training or prior
on the reflectance, yet this comes at the price of a dedicated acquisition
setup. Both quantitative and qualitative evaluations illustrate the
effectiveness of the proposed methods on synthetic and real-world scenarios.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(T-PAMI), 2019. First three authors contribute equall
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps
This paper addresses the problem of single image depth estimation (SIDE),
focusing on improving the quality of deep neural network predictions. In a
supervised learning scenario, the quality of predictions is intrinsically
related to the training labels, which guide the optimization process. For
indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to
provide dense, albeit short-range, depth maps. On the other hand, for outdoor
scenes, LiDARs are considered the standard sensor, which comparatively provides
much sparser measurements, especially in areas further away. Rather than
modifying the neural network architecture to deal with sparse depth maps, this
article introduces a novel densification method for depth maps, using the
Hilbert Maps framework. A continuous occupancy map is produced based on 3D
points from LiDAR scans, and the resulting reconstructed surface is projected
into a 2D depth map with arbitrary resolution. Experiments conducted with
various subsets of the KITTI dataset show a significant improvement produced by
the proposed Sparse-to-Continuous technique, without the introduction of extra
information into the training stage.Comment: Accepted. (c) 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
- …