2,837 research outputs found
Diagnosing and Enhancing VAE Models
Although variational autoencoders (VAEs) represent a widely influential deep
generative model, many aspects of the underlying energy function remain poorly
understood. In particular, it is commonly believed that Gaussian
encoder/decoder assumptions reduce the effectiveness of VAEs in generating
realistic samples. In this regard, we rigorously analyze the VAE objective,
differentiating situations where this belief is and is not actually true. We
then leverage the corresponding insights to develop a simple VAE enhancement
that requires no additional hyperparameters or sensitive tuning.
Quantitatively, this proposal produces crisp samples and stable FID scores that
are actually competitive with a variety of GAN models, all while retaining
desirable attributes of the original VAE architecture. A shorter version of
this work will appear in the ICLR 2019 conference proceedings (Dai and Wipf,
2019). The code for our model is available at https://github.com/daib13/
TwoStageVAE
Two-Manifold Problems
Recently, there has been much interest in spectral approaches to learning
manifolds---so-called kernel eigenmap methods. These methods have had some
successes, but their applicability is limited because they are not robust to
noise. To address this limitation, we look at two-manifold problems, in which
we simultaneously reconstruct two related manifolds, each representing a
different view of the same data. By solving these interconnected learning
problems together and allowing information to flow between them, two-manifold
algorithms are able to succeed where a non-integrated approach would fail: each
view allows us to suppress noise in the other, reducing bias in the same way
that an instrumental variable allows us to remove bias in a {linear}
dimensionality reduction problem. We propose a class of algorithms for
two-manifold problems, based on spectral decomposition of cross-covariance
operators in Hilbert space. Finally, we discuss situations where two-manifold
problems are useful, and demonstrate that solving a two-manifold problem can
aid in learning a nonlinear dynamical system from limited data
On GANs and GMMs
A longstanding problem in machine learning is to find unsupervised methods
that can learn the statistical structure of high dimensional signals. In recent
years, GANs have gained much attention as a possible solution to the problem,
and in particular have shown the ability to generate remarkably realistic high
resolution sampled images. At the same time, many authors have pointed out that
GANs may fail to model the full distribution ("mode collapse") and that using
the learned models for anything other than generating samples may be very
difficult. In this paper, we examine the utility of GANs in learning
statistical models of images by comparing them to perhaps the simplest
statistical model, the Gaussian Mixture Model. First, we present a simple
method to evaluate generative models based on relative proportions of samples
that fall into predetermined bins. Unlike previous automatic methods for
evaluating models, our method does not rely on an additional neural network nor
does it require approximating intractable computations. Second, we compare the
performance of GANs to GMMs trained on the same datasets. While GMMs have
previously been shown to be successful in modeling small patches of images, we
show how to train them on full sized images despite the high dimensionality.
Our results show that GMMs can generate realistic samples (although less sharp
than those of GANs) but also capture the full distribution, which GANs fail to
do. Furthermore, GMMs allow efficient inference and explicit representation of
the underlying statistical structure. Finally, we discuss how GMMs can be used
to generate sharp images.Comment: Accepted to NIPS 201
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Despite the widespread practical success of deep learning methods, our
theoretical understanding of the dynamics of learning in deep neural networks
remains quite sparse. We attempt to bridge the gap between the theory and
practice of deep learning by systematically analyzing learning dynamics for the
restricted case of deep linear neural networks. Despite the linearity of their
input-output map, such networks have nonlinear gradient descent dynamics on
weights that change with the addition of each new hidden layer. We show that
deep linear networks exhibit nonlinear learning phenomena similar to those seen
in simulations of nonlinear networks, including long plateaus followed by rapid
transitions to lower error solutions, and faster convergence from greedy
unsupervised pretraining initial conditions than from random initial
conditions. We provide an analytical description of these phenomena by finding
new exact solutions to the nonlinear dynamics of deep learning. Our theoretical
analysis also reveals the surprising finding that as the depth of a network
approaches infinity, learning speed can nevertheless remain finite: for a
special class of initial conditions on the weights, very deep networks incur
only a finite, depth independent, delay in learning speed relative to shallow
networks. We show that, under certain conditions on the training data,
unsupervised pretraining can find this special class of initial conditions,
while scaled random Gaussian initializations cannot. We further exhibit a new
class of random orthogonal initial conditions on weights that, like
unsupervised pre-training, enjoys depth independent learning times. We further
show that these initial conditions also lead to faithful propagation of
gradients even in deep nonlinear networks, as long as they operate in a special
regime known as the edge of chaos.Comment: Submission to ICLR2014. Revised based on reviewer feedbac
Differentiating Concepts and Instances for Knowledge Graph Embedding
Concepts, which represent a group of different instances sharing common
properties, are essential information in knowledge representation. Most
conventional knowledge embedding methods encode both entities (concepts and
instances) and relations as vectors in a low dimensional semantic space
equally, ignoring the difference between concepts and instances. In this paper,
we propose a novel knowledge graph embedding model named TransC by
differentiating concepts and instances. Specifically, TransC encodes each
concept in knowledge graph as a sphere and each instance as a vector in the
same semantic space. We use the relative positions to model the relations
between concepts and instances (i.e., instanceOf), and the relations between
concepts and sub-concepts (i.e., subClassOf). We evaluate our model on both
link prediction and triple classification tasks on the dataset based on YAGO.
Experimental results show that TransC outperforms state-of-the-art methods, and
captures the semantic transitivity for instanceOf and subClassOf relation. Our
codes and datasets can be obtained from https:// github.com/davidlvxin/TransC
Understanding Machine-learned Density Functionals
Kernel ridge regression is used to approximate the kinetic energy of
non-interacting fermions in a one-dimensional box as a functional of their
density. The properties of different kernels and methods of cross-validation
are explored, and highly accurate energies are achieved. Accurate {\em
constrained optimal densities} are found via a modified Euler-Lagrange
constrained minimization of the total energy. A projected gradient descent
algorithm is derived using local principal component analysis. Additionally, a
sparse grid representation of the density can be used without degrading the
performance of the methods. The implications for machine-learned density
functional approximations are discussed
An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization
We consider maximum likelihood estimation for Gaussian Mixture Models (Gmms).
This task is almost invariably solved (in theory and practice) via the
Expectation Maximization (EM) algorithm. EM owes its success to various
factors, of which is its ability to fulfill positive definiteness constraints
in closed form is of key importance. We propose an alternative to EM by
appealing to the rich Riemannian geometry of positive definite matrices, using
which we cast Gmm parameter estimation as a Riemannian optimization problem.
Surprisingly, such an out-of-the-box Riemannian formulation completely fails
and proves much inferior to EM. This motivates us to take a closer look at the
problem geometry, and derive a better formulation that is much more amenable to
Riemannian optimization. We then develop (Riemannian) batch and stochastic
gradient algorithms that outperform EM, often substantially. We provide a
non-asymptotic convergence analysis for our stochastic method, which is also
the first (to our knowledge) such global analysis for Riemannian stochastic
gradient. Numerous empirical results are included to demonstrate the
effectiveness of our methods.Comment: 21 pages, 6 figure
Randomized Physics-based Motion Planning for Grasping in Cluttered and Uncertain Environments
Planning motions to grasp an object in cluttered and uncertain environments
is a challenging task, particularly when a collision-free trajectory does not
exist and objects obstructing the way are required to be carefully grasped and
moved out. This paper takes a different approach and proposes to address this
problem by using a randomized physics-based motion planner that permits
robot-object and object-object interactions. The main idea is to avoid an
explicit high-level reasoning of the task by providing the motion planner with
a physics engine to evaluate possible complex multi-body dynamical
interactions. The approach is able to solve the problem in complex scenarios,
also considering uncertainty in the objects pose and in the contact dynamics.
The work enhances the state validity checker, the control sampler and the tree
exploration strategy of a kinodynamic motion planner called KPIECE. The
enhanced algorithm, called p-KPIECE, has been validated in simulation and with
real experiments. The results have been compared with an ontological
physics-based motion planner and with task and motion planning approaches,
resulting in a significant improvement in terms of planning time, success rate
and quality of the solution path.Comment: IEEE Robotics and Automation Letters. Preprin Version. Accepted
November, 201
Realtime State Estimation with Tactile and Visual Sensing for Inserting a Suction-held Object
We develop a real-time state estimation system to recover the pose and
contact formation of an object relative to its environment. In this paper, we
focus on the application of inserting an object picked by a suction cup into a
tight space, an enabling technology for robotic packaging.
We propose a framework that fuses force and visual sensing for improved
accuracy and robustness. Visual sensing is versatile and non-intrusive, but
suffers from occlusions and limited accuracy, especially for tasks involving
contact. Tactile sensing is local, but provides accuracy and robustness to
occlusions. The proposed algorithm to fuse them is based on iSAM, an on-line
optimization technique, which we use to incorporate kinematic measurements from
the robot, contact geometry of the object and the container, and visual
tracking. In this paper, we generalize previous results in planar settings to a
3D task with more complex contact interactions. A key challenge in using force
sensing is that we do not observe contact point locations directly. We propose
a data-driven method to infer the contact formation, which is then used in
real-time by the state estimator. We demonstrate and evaluate the algorithm in
a setup instrumented to provide groundtruth.Comment: 8 pages, 10 figures, submitted to IROS 201
- …