73,366 research outputs found
Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization
Pre-conditioning is a well-known concept that can significantly improve the
convergence of optimization algorithms. For noise-free problems, where good
pre-conditioners are not known a priori, iterative linear algebra methods offer
one way to efficiently construct them. For the stochastic optimization problems
that dominate contemporary machine learning, however, this approach is not
readily available. We propose an iterative algorithm inspired by classic
iterative linear solvers that uses a probabilistic model to actively infer a
pre-conditioner in situations where Hessian-projections can only be constructed
with strong Gaussian noise. The algorithm is empirically demonstrated to
efficiently construct effective pre-conditioners for stochastic gradient
descent and its variants. Experiments on problems of comparably low
dimensionality show improved convergence. In very high-dimensional problems,
such as those encountered in deep learning, the pre-conditioner effectively
becomes an automatic learning-rate adaptation scheme, which we also empirically
show to work well.Comment: Conferenc
Deep Closest Point: Learning Representations for Point Cloud Registration
Point cloud registration is a key problem for computer vision applied to
robotics, medical imaging, and other applications. This problem involves
finding a rigid transformation from one point cloud into another so that they
align. Iterative Closest Point (ICP) and its variants provide simple and
easily-implemented iterative methods for this task, but these algorithms can
converge to spurious local optima. To address local optima and other
difficulties in the ICP pipeline, we propose a learning-based method, titled
Deep Closest Point (DCP), inspired by recent techniques in computer vision and
natural language processing. Our model consists of three parts: a point cloud
embedding network, an attention-based module combined with a pointer generation
layer, to approximate combinatorial matching, and a differentiable singular
value decomposition (SVD) layer to extract the final rigid transformation. We
train our model end-to-end on the ModelNet40 dataset and show in several
settings that it performs better than ICP, its variants (e.g., Go-ICP, FGR),
and the recently-proposed learning-based method PointNetLK. Beyond providing a
state-of-the-art registration technique, we evaluate the suitability of our
learned features transferred to unseen objects. We also provide preliminary
analysis of our learned model to help understand whether domain-specific and/or
global features facilitate rigid registration
A VEST of the Pseudoinverse Learning Algorithm
In this paper, we briefly review the basic scheme of the pseudoinverse
learning (PIL) algorithm and present some discussions on the PIL, as well as
its variants. The PIL algorithm, first presented in 1995, is a non-gradient
descent and non-iterative learning algorithm for multi-layer neural networks
and has several advantages compared with gradient descent based algorithms.
Some new viewpoints to PIL algorithm are presented, and several common pitfalls
in practical implementation of the neural network learning task are also
addressed. In addition, we show that so called extreme learning machine is a
Variant crEated by Simple name alTernation (VEST) of the PIL algorithm for
single hidden layer feedforward neural networks.Comment: ELM is another name of the PI
Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins
Given a matrix , a linear feasibility problem (of which linear
classification is a special case) aims to find a solution to a primal problem
or a certificate for the dual problem which is a
probability distribution . Inspired by the continued
importance of "large-margin classifiers" in machine learning, this paper
studies a condition measure of called its \textit{margin} that determines
the difficulty of both the above problems. To aid geometrical intuition, we
first establish new characterizations of the margin in terms of relevant balls,
cones and hulls. Our second contribution is analytical, where we present
generalizations of Gordan's theorem, and variants of Hoffman's theorems, both
using margins. We end by proving some new results on a classical iterative
scheme, the Perceptron, whose convergence rates famously depends on the margin.
Our results are relevant for a deeper understanding of margin-based learning
and proving convergence rates of iterative schemes, apart from providing a
unifying perspective on this vast topic.Comment: 18 pages, 3 figure
DropoutDAgger: A Bayesian Approach to Safe Imitation Learning
While imitation learning is becoming common practice in robotics, this
approach often suffers from data mismatch and compounding errors. DAgger is an
iterative algorithm that addresses these issues by continually aggregating
training data from both the expert and novice policies, but does not consider
the impact of safety. We present a probabilistic extension to DAgger, which
uses the distribution over actions provided by the novice policy, for a given
observation. Our method, which we call DropoutDAgger, uses dropout to train the
novice as a Bayesian neural network that provides insight to its confidence.
Using the distribution over the novice's actions, we estimate a probabilistic
measure of safety with respect to the expert action, tuned to balance
exploration and exploitation. The utility of this approach is evaluated on the
MuJoCo HalfCheetah and in a simple driving experiment, demonstrating improved
performance and safety compared to other DAgger variants and classic imitation
learning
EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning
While imitation learning is often used in robotics, the approach frequently
suffers from data mismatch and compounding errors. DAgger is an iterative
algorithm that addresses these issues by aggregating training data from both
the expert and novice policies, but does not consider the impact of safety. We
present a probabilistic extension to DAgger, which attempts to quantify the
confidence of the novice policy as a proxy for safety. Our method,
EnsembleDAgger, approximates a Gaussian Process using an ensemble of neural
networks. Using the variance as a measure of confidence, we compute a decision
rule that captures how much we doubt the novice, thus determining when it is
safe to allow the novice to act. With this approach, we aim to maximize the
novice's share of actions, while constraining the probability of failure. We
demonstrate improved safety and learning performance compared to other DAgger
variants and classic imitation learning on an inverted pendulum and in the
MuJoCo HalfCheetah environment.Comment: Accepted to the 2019 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2019
Convolutional Neural Networks for Non-iterative Reconstruction of Compressively Sensed Images
Traditional algorithms for compressive sensing recovery are computationally
expensive and are ineffective at low measurement rates. In this work, we
propose a data driven non-iterative algorithm to overcome the shortcomings of
earlier iterative algorithms. Our solution, ReconNet, is a deep neural network,
whose parameters are learned end-to-end to map block-wise compressive
measurements of the scene to the desired image blocks. Reconstruction of an
image becomes a simple forward pass through the network and can be done in
real-time. We show empirically that our algorithm yields reconstructions with
higher PSNRs compared to iterative algorithms at low measurement rates and in
presence of measurement noise. We also propose a variant of ReconNet which uses
adversarial loss in order to further improve reconstruction quality. We discuss
how adding a fully connected layer to the existing ReconNet architecture allows
for jointly learning the measurement matrix and the reconstruction algorithm in
a single network. Experiments on real data obtained from a block compressive
imager show that our networks are robust to unseen sensor noise. Finally,
through an experiment in object tracking, we show that even at very low
measurement rates, reconstructions using our algorithm possess rich semantic
content that can be used for high level inference
IHT dies hard: Provable accelerated Iterative Hard Thresholding
We study --both in theory and practice-- the use of momentum motions in
classic iterative hard thresholding (IHT) methods. By simply modifying plain
IHT, we investigate its convergence behavior on convex optimization criteria
with non-convex constraints, under standard assumptions. In diverse scenaria,
we observe that acceleration in IHT leads to significant improvements, compared
to state of the art projected gradient descent and Frank-Wolfe variants. As a
byproduct of our inspection, we study the impact of selecting the momentum
parameter: similar to convex settings, two modes of behavior are observed
--"rippling" and linear-- depending on the level of momentum.Comment: accepted to AISTATS 201
Exploring the Space of Black-box Attacks on Deep Neural Networks
Existing black-box attacks on deep neural networks (DNNs) so far have largely
focused on transferability, where an adversarial instance generated for a
locally trained model can "transfer" to attack other learning models. In this
paper, we propose novel Gradient Estimation black-box attacks for adversaries
with query access to the target model's class probabilities, which do not rely
on transferability. We also propose strategies to decouple the number of
queries required to generate each adversarial sample from the dimensionality of
the input. An iterative variant of our attack achieves close to 100%
adversarial success rates for both targeted and untargeted attacks on DNNs. We
carry out extensive experiments for a thorough comparative evaluation of
black-box attacks and show that the proposed Gradient Estimation attacks
outperform all transferability based black-box attacks we tested on both MNIST
and CIFAR-10 datasets, achieving adversarial success rates similar to well
known, state-of-the-art white-box attacks. We also apply the Gradient
Estimation attacks successfully against a real-world Content Moderation
classifier hosted by Clarifai. Furthermore, we evaluate black-box attacks
against state-of-the-art defenses. We show that the Gradient Estimation attacks
are very effective even against these defenses.Comment: 25 pages, 7 figures, 10 table
Direct Synthesis of Iterative Algorithms With Bounds on Achievable Worst-Case Convergence Rate
Iterative first-order methods such as gradient descent and its variants are
widely used for solving optimization and machine learning problems. There has
been recent interest in analytic or numerically efficient methods for computing
worst-case performance bounds for such algorithms, for example over the class
of strongly convex loss functions. A popular approach is to assume the
algorithm has a fixed size (fixed dimension, or memory) and that its structure
is parameterized by one or two hyperparameters, for example a learning rate and
a momentum parameter. Then, a Lyapunov function is sought to certify robust
stability and subsequent optimization can be performed to find optimal
hyperparameter tunings. In the present work, we instead fix the constraints
that characterize the loss function and apply techniques from robust control
synthesis to directly search over algorithms. This approach yields stronger
results than those previously available, since the bounds produced hold over
algorithms with an arbitrary, but finite, amount of memory rather than just
holding for algorithms with a prescribed structure.Comment: American Control Conference, 202
- …