210 research outputs found
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
3D reconstruction from a single image is a key problem in multiple
applications ranging from robotic manipulation to augmented reality. Prior
methods have tackled this problem through generative models which predict 3D
reconstructions as voxels or point clouds. However, these methods can be
computationally expensive and miss fine details. We introduce a new
differentiable layer for 3D data deformation and use it in DeformNet to learn a
model for 3D reconstruction-through-deformation. DeformNet takes an image
input, searches the nearest shape template from a database, and deforms the
template to match the query image. We evaluate our approach on the ShapeNet
dataset and show that - (a) the Free-Form Deformation layer is a powerful new
building block for Deep Learning models that manipulate 3D data (b) DeformNet
uses this FFD layer combined with shape retrieval for smooth and
detail-preserving 3D reconstruction of qualitatively plausible point clouds
with respect to a single query image (c) compared to other state-of-the-art 3D
reconstruction methods, DeformNet quantitatively matches or outperforms their
benchmarks by significant margins. For more information, visit:
https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP
DiME: Maximizing Mutual Information by a Difference of Matrix-Based Entropies
We introduce an information-theoretic quantity with similar properties to
mutual information that can be estimated from data without making explicit
assumptions on the underlying distribution. This quantity is based on a
recently proposed matrix-based entropy that uses the eigenvalues of a
normalized Gram matrix to compute an estimate of the eigenvalues of an
uncentered covariance operator in a reproducing kernel Hilbert space. We show
that a difference of matrix-based entropies (DiME) is well suited for problems
involving the maximization of mutual information between random variables.
While many methods for such tasks can lead to trivial solutions, DiME naturally
penalizes such outcomes. We compare DiME to several baseline estimators of
mutual information on a toy Gaussian dataset. We provide examples of use cases
for DiME, such as latent factor disentanglement and a multiview representation
learning problem where DiME is used to learn a shared representation among
views with high mutual information
To Compress or Not to Compress -- Self-Supervised Learning and Information Theory: A Review
Deep neural networks have demonstrated remarkable performance in supervised
learning tasks but require large amounts of labeled data. Self-supervised
learning offers an alternative paradigm, enabling the model to learn from data
without explicit labels. Information theory has been instrumental in
understanding and optimizing deep neural networks. Specifically, the
information bottleneck principle has been applied to optimize the trade-off
between compression and relevant information preservation in supervised
settings. However, the optimal information objective in self-supervised
learning remains unclear. In this paper, we review various approaches to
self-supervised learning from an information-theoretic standpoint and present a
unified framework that formalizes the \textit{self-supervised
information-theoretic learning problem}. We integrate existing research into a
coherent framework, examine recent self-supervised methods, and identify
research opportunities and challenges. Moreover, we discuss empirical
measurement of information-theoretic quantities and their estimators. This
paper offers a comprehensive review of the intersection between information
theory, self-supervised learning, and deep neural networks
Semi-supervised Tuning from Temporal Coherence
Recent works demonstrated the usefulness of temporal coherence to regularize
supervised training or to learn invariant features with deep architectures. In
particular, enforcing smooth output changes while presenting temporally-closed
frames from video sequences, proved to be an effective strategy. In this paper
we prove the efficacy of temporal coherence for semi-supervised incremental
tuning. We show that a deep architecture, just mildly trained in a supervised
manner, can progressively improve its classification accuracy, if exposed to
video sequences of unlabeled data. The extent to which, in some cases, a
semi-supervised tuning allows to improve classification accuracy (approaching
the supervised one) is somewhat surprising. A number of control experiments
pointed out the fundamental role of temporal coherence.Comment: Under review as a conference paper at ICLR 201
Learning to Reconstruct Shapes from Unseen Classes
From a single image, humans are able to perceive the full 3D shape of an
object by exploiting learned shape priors from everyday life. Contemporary
single-image 3D reconstruction algorithms aim to solve this task in a similar
fashion, but often end up with priors that are highly biased by training
classes. Here we present an algorithm, Generalizable Reconstruction (GenRe),
designed to capture more generic, class-agnostic shape priors. We achieve this
with an inference network and training procedure that combine 2.5D
representations of visible surfaces (depth and silhouette), spherical shape
representations of both visible and non-visible surfaces, and 3D voxel-based
representations, in a principled manner that exploits the causal structure of
how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe
performs well on single-view shape reconstruction, and generalizes to diverse
novel objects from categories not seen during training.Comment: NeurIPS 2018 (Oral). The first two authors contributed equally to
this paper. Project page: http://genre.csail.mit.edu
- …