74,817 research outputs found
Learning sparse representations of depth
This paper introduces a new method for learning and inferring sparse
representations of depth (disparity) maps. The proposed algorithm relaxes the
usual assumption of the stationary noise model in sparse coding. This enables
learning from data corrupted with spatially varying noise or uncertainty,
typically obtained by laser range scanners or structured light depth cameras.
Sparse representations are learned from the Middlebury database disparity maps
and then exploited in a two-layer graphical model for inferring depth from
stereo, by including a sparsity prior on the learned features. Since they
capture higher-order dependencies in the depth structure, these priors can
complement smoothness priors commonly used in depth inference based on Markov
Random Field (MRF) models. Inference on the proposed graph is achieved using an
alternating iterative optimization technique, where the first layer is solved
using an existing MRF-based stereo matching algorithm, then held fixed as the
second layer is solved using the proposed non-stationary sparse coding
algorithm. This leads to a general method for improving solutions of state of
the art MRF-based depth estimation algorithms. Our experimental results first
show that depth inference using learned representations leads to state of the
art denoising of depth maps obtained from laser range scanners and a time of
flight camera. Furthermore, we show that adding sparse priors improves the
results of two depth estimation methods: the classical graph cut algorithm by
Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page
Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge
Distributional models provide a convenient way to model semantics using dense
embedding spaces derived from unsupervised learning algorithms. However, the
dimensions of dense embedding spaces are not designed to resemble human
semantic knowledge. Moreover, embeddings are often built from a single source
of information (typically text data), even though neurocognitive research
suggests that semantics is deeply linked to both language and perception. In
this paper, we combine multimodal information from both text and image-based
representations derived from state-of-the-art distributional models to produce
sparse, interpretable vectors using Joint Non-Negative Sparse Embedding.
Through in-depth analyses comparing these sparse models to human-derived
behavioural and neuroimaging data, we demonstrate their ability to predict
interpretable linguistic descriptions of human ground-truth semantic knowledge.Comment: Proceedings of the 22nd Conference on Computational Natural Language
Learning (CoNLL 2018), pages 260-270. Brussels, Belgium, October 31 -
November 1, 2018. Association for Computational Linguistic
LRRU: Long-short Range Recurrent Updating Networks for Depth Completion
Existing deep learning-based depth completion methods generally employ
massive stacked layers to predict the dense depth map from sparse input data.
Although such approaches greatly advance this task, their accompanied huge
computational complexity hinders their practical applications. To accomplish
depth completion more efficiently, we propose a novel lightweight deep network
framework, the Long-short Range Recurrent Updating (LRRU) network. Without
learning complex feature representations, LRRU first roughly fills the sparse
input to obtain an initial dense depth map, and then iteratively updates it
through learned spatially-variant kernels. Our iterative update process is
content-adaptive and highly flexible, where the kernel weights are learned by
jointly considering the guidance RGB images and the depth map to be updated,
and large-to-small kernel scopes are dynamically adjusted to capture
long-to-short range dependencies. Our initial depth map has coarse but complete
scene depth information, which helps relieve the burden of directly regressing
the dense depth from sparse ones, while our proposed method can effectively
refine it to an accurate depth map with less learnable parameters and inference
time. Experimental results demonstrate that our proposed LRRU variants achieve
state-of-the-art performance across different parameter regimes. In particular,
the LRRU-Base model outperforms competing approaches on the NYUv2 dataset, and
ranks 1st on the KITTI depth completion benchmark at the time of submission.
Project page: https://npucvr.github.io/LRRU/.Comment: Published in ICCV 202
Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance
Image outpainting technology generates visually plausible content regardless
of authenticity, making it unreliable to be applied in practice. Thus, we
propose a reliable image outpainting task, introducing the sparse depth from
LiDARs to extrapolate authentic RGB scenes. The large field view of LiDARs
allows it to serve for data enhancement and further multimodal tasks.
Concretely, we propose a Depth-Guided Outpainting Network to model different
feature representations of two modalities and learn the structure-aware
cross-modal fusion. And two components are designed: 1) The Multimodal Learning
Module produces unique depth and RGB feature representations from the
perspectives of different modal characteristics. 2) The Depth Guidance Fusion
Module leverages the complete depth modality to guide the establishment of RGB
contents by progressive multimodal feature fusion. Furthermore, we specially
design an additional constraint strategy consisting of Cross-modal Loss and
Edge Loss to enhance ambiguous contours and expedite reliable content
generation. Extensive experiments on KITTI and Waymo datasets demonstrate our
superiority over the state-of-the-art method, quantitatively and qualitatively
A Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map Super-Resolution
High-resolution depth maps can be inferred from low-resolution depth
measurements and an additional high-resolution intensity image of the same
scene. To that end, we introduce a bimodal co-sparse analysis model, which is
able to capture the interdependency of registered intensity and depth
information. This model is based on the assumption that the co-supports of
corresponding bimodal image structures are aligned when computed by a suitable
pair of analysis operators. No analytic form of such operators exist and we
propose a method for learning them from a set of registered training signals.
This learning process is done offline and returns a bimodal analysis operator
that is universally applicable to natural scenes. We use this to exploit the
bimodal co-sparse analysis model as a prior for solving inverse problems, which
leads to an efficient algorithm for depth map super-resolution.Comment: 13 pages, 4 figure
Sparse Coding on Stereo Video for Object Detection
Deep Convolutional Neural Networks (DCNN) require millions of labeled
training examples for image classification and object detection tasks, which
restrict these models to domains where such datasets are available. In this
paper, we explore the use of unsupervised sparse coding applied to stereo-video
data to help alleviate the need for large amounts of labeled data. We show that
replacing a typical supervised convolutional layer with an unsupervised
sparse-coding layer within a DCNN allows for better performance on a car
detection task when only a limited number of labeled training examples is
available. Furthermore, the network that incorporates sparse coding allows for
more consistent performance over varying initializations and ordering of
training examples when compared to a fully supervised DCNN. Finally, we compare
activations between the unsupervised sparse-coding layer and the supervised
convolutional layer, and show that the sparse representation exhibits an
encoding that is depth selective, whereas encodings from the convolutional
layer do not exhibit such selectivity. These result indicates promise for using
unsupervised sparse-coding approaches in real-world computer vision tasks in
domains with limited labeled training data
- …