3,782 research outputs found
Self-Supervised Intensity-Event Stereo Matching
Event cameras are novel bio-inspired vision sensors that output pixel-level
intensity changes in microsecond accuracy with a high dynamic range and low
power consumption. Despite these advantages, event cameras cannot be directly
applied to computational imaging tasks due to the inability to obtain
high-quality intensity and events simultaneously. This paper aims to connect a
standalone event camera and a modern intensity camera so that the applications
can take advantage of both two sensors. We establish this connection through a
multi-modal stereo matching task. We first convert events to a reconstructed
image and extend the existing stereo networks to this multi-modality condition.
We propose a self-supervised method to train the multi-modal stereo network
without using ground truth disparity data. The structure loss calculated on
image gradients is used to enable self-supervised learning on such multi-modal
data. Exploiting the internal stereo constraint between views with different
modalities, we introduce general stereo loss functions, including disparity
cross-consistency loss and internal disparity loss, leading to improved
performance and robustness compared to existing approaches. The experiments
demonstrate the effectiveness of the proposed method, especially the proposed
general stereo loss functions, on both synthetic and real datasets. At last, we
shed light on employing the aligned events and intensity images in downstream
tasks, e.g., video interpolation application.Comment: This paper has been accepted by the Journal of Imaging Science &
Technolog
There and Back Again: Self-supervised Multispectral Correspondence Estimation
Across a wide range of applications, from autonomous vehicles to medical
imaging, multi-spectral images provide an opportunity to extract additional
information not present in color images. One of the most important steps in
making this information readily available is the accurate estimation of dense
correspondences between different spectra.
Due to the nature of cross-spectral images, most correspondence solving
techniques for the visual domain are simply not applicable. Furthermore, most
cross-spectral techniques utilize spectra-specific characteristics to perform
the alignment. In this work, we aim to address the dense correspondence
estimation problem in a way that generalizes to more than one spectrum. We do
this by introducing a novel cycle-consistency metric that allows us to
self-supervise. This, combined with our spectra-agnostic loss functions, allows
us to train the same network across multiple spectra.
We demonstrate our approach on the challenging task of dense RGB-FIR
correspondence estimation. We also show the performance of our unmodified
network on the cases of RGB-NIR and RGB-RGB, where we achieve higher accuracy
than similar self-supervised approaches. Our work shows that cross-spectral
correspondence estimation can be solved in a common framework that learns to
generalize alignment across spectra
Generating 3D faces using Convolutional Mesh Autoencoders
Learned 3D representations of human faces are useful for computer vision
problems such as 3D face tracking and reconstruction from images, as well as
graphics applications such as character generation and animation. Traditional
models learn a latent representation of a face using linear subspaces or
higher-order tensor generalizations. Due to this linearity, they can not
capture extreme deformations and non-linear expressions. To address this, we
introduce a versatile model that learns a non-linear representation of a face
using spectral convolutions on a mesh surface. We introduce mesh sampling
operations that enable a hierarchical mesh representation that captures
non-linear variations in shape and expression at multiple scales within the
model. In a variational setting, our model samples diverse realistic 3D faces
from a multivariate Gaussian distribution. Our training data consists of 20,466
meshes of extreme expressions captured over 12 different subjects. Despite
limited training data, our trained model outperforms state-of-the-art face
models with 50% lower reconstruction error, while using 75% fewer parameters.
We also show that, replacing the expression space of an existing
state-of-the-art face model with our autoencoder, achieves a lower
reconstruction error. Our data, model and code are available at
http://github.com/anuragranj/com
- …