7,152 research outputs found
Q-Learning for Continuous Actions with Cross-Entropy Guided Policies
Off-Policy reinforcement learning (RL) is an important class of methods for
many problem domains, such as robotics, where the cost of collecting data is
high and on-policy methods are consequently intractable. Standard methods for
applying Q-learning to continuous-valued action domains involve iteratively
sampling the Q-function to find a good action (e.g. via hill-climbing), or by
learning a policy network at the same time as the Q-function (e.g. DDPG). Both
approaches make tradeoffs between stability, speed, and accuracy. We propose a
novel approach, called Cross-Entropy Guided Policies, or CGP, that draws
inspiration from both classes of techniques. CGP aims to combine the stability
and performance of iterative sampling policies with the low computational cost
of a policy network. Our approach trains the Q-function using iterative
sampling with the Cross-Entropy Method (CEM), while training a policy network
to imitate CEM's sampling behavior. We demonstrate that our method is more
stable to train than state of the art policy network methods, while preserving
equivalent inference time compute costs, and achieving competitive total reward
on standard benchmarks
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
UMAP (Uniform Manifold Approximation and Projection) is a novel manifold
learning technique for dimension reduction. UMAP is constructed from a
theoretical framework based in Riemannian geometry and algebraic topology. The
result is a practical scalable algorithm that applies to real world data. The
UMAP algorithm is competitive with t-SNE for visualization quality, and
arguably preserves more of the global structure with superior run time
performance. Furthermore, UMAP has no computational restrictions on embedding
dimension, making it viable as a general purpose dimension reduction technique
for machine learning.Comment: Reference implementation available at http://github.com/lmcinnes/uma
ADMM-SOFTMAX : An ADMM Approach for Multinomial Logistic Regression
We present ADMM-Softmax, an alternating direction method of multipliers
(ADMM) for solving multinomial logistic regression (MLR) problems. Our method
is geared toward supervised classification tasks with many examples and
features. It decouples the nonlinear optimization problem in MLR into three
steps that can be solved efficiently. In particular, each iteration of
ADMM-Softmax consists of a linear least-squares problem, a set of independent
small-scale smooth, convex problems, and a trivial dual variable update.
Solution of the least-squares problem can be be accelerated by pre-computing a
factorization or preconditioner, and the separability in the smooth, convex
problem can be easily parallelized across examples. For two image
classification problems, we demonstrate that ADMM-Softmax leads to improved
generalization compared to a Newton-Krylov, a quasi Newton, and a stochastic
gradient descent method
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
Video learning is an important task in computer vision and has experienced
increasing interest over the recent years. Since even a small amount of videos
easily comprises several million frames, methods that do not rely on a
frame-level annotation are of special importance. In this work, we propose a
novel learning algorithm with a Viterbi-based loss that allows for online and
incremental learning of weakly annotated video data. We moreover show that
explicit context and length modeling leads to huge improvements in video
segmentation and labeling tasks andinclude these models into our framework. On
several action segmentation benchmarks, we obtain an improvement of up to 10%
compared to current state-of-the-art methods.Comment: CVPR 201
Stochastic Nonconvex Optimization with Large Minibatches
We study stochastic optimization of nonconvex loss functions, which are
typical objectives for training neural networks. We propose stochastic
approximation algorithms which optimize a series of regularized, nonlinearized
losses on large minibatches of samples, using only first-order gradient
information. Our algorithms provably converge to an approximate critical point
of the expected objective with faster rates than minibatch stochastic gradient
descent, and facilitate better parallelization by allowing larger minibatches.Comment: Accepted by the ALT 201
Algorithm Runtime Prediction: Methods & Evaluation
Perhaps surprisingly, it is possible to predict how long an algorithm will
take to run on a previously unseen input, using machine learning techniques to
build a model of the algorithm's runtime as a function of problem-specific
instance features. Such models have important applications to algorithm
analysis, portfolio-based algorithm selection, and the automatic configuration
of parameterized algorithms. Over the past decade, a wide variety of techniques
have been studied for building such models. Here, we describe extensions and
improvements of existing models, new families of models, and -- perhaps most
importantly -- a much more thorough treatment of algorithm parameters as model
inputs. We also comprehensively describe new and existing features for
predicting algorithm runtime for propositional satisfiability (SAT), travelling
salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate
these innovations through the largest empirical analysis of its kind, comparing
to a wide range of runtime modelling techniques from the literature. Our
experiments consider 11 algorithms and 35 instance distributions; they also
span a very wide range of SAT, MIP, and TSP instances, with the least
structured having been generated uniformly at random and the most structured
having emerged from real industrial applications. Overall, we demonstrate that
our new models yield substantially better runtime predictions than previous
approaches in terms of their generalization to new problem instances, to new
algorithms from a parameterized space, and to both simultaneously.Comment: 51 pages, 13 figures, 8 tables. Added references, feature cost, and
experiments with subsets of features; reworded Sections 1&
Gradient Hyperalignment for multi-subject fMRI data alignment
Multi-subject fMRI data analysis is an interesting and challenging problem in
human brain decoding studies. The inherent anatomical and functional
variability across subjects make it necessary to do both anatomical and
functional alignment before classification analysis. Besides, when it comes to
big data, time complexity becomes a problem that cannot be ignored. This paper
proposes Gradient Hyperalignment (Gradient-HA) as a gradient-based functional
alignment method that is suitable for multi-subject fMRI datasets with large
amounts of samples and voxels. The advantage of Gradient-HA is that it can
solve independence and high dimension problems by using Independent Component
Analysis (ICA) and Stochastic Gradient Ascent (SGA). Validation using
multi-classification tasks on big data demonstrates that Gradient-HA method has
less time complexity and better or comparable performance compared with other
state-of-the-art functional alignment methods.Comment: 15th Pacific Rim International Conference on Artificial Intelligence
(PRICAI 2018), Nanjing, China, August 28-31, 201
The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size
We apply state-of-the-art tools in modern high-dimensional numerical linear
algebra to approximate efficiently the spectrum of the Hessian of modern
deepnets, with tens of millions of parameters, trained on real data. Our
results corroborate previous findings, based on small-scale networks, that the
Hessian exhibits "spiked" behavior, with several outliers isolated from a
continuous bulk. We decompose the Hessian into different components and study
the dynamics with training and sample size of each term individually
Automated Synthesis of Safe Digital Controllers for Sampled-Data Stochastic Nonlinear Systems
We present a new method for the automated synthesis of digital controllers
with formal safety guarantees for systems with nonlinear dynamics, noisy output
measurements, and stochastic disturbances. Our method derives digital
controllers such that the corresponding closed-loop system, modeled as a
sampled-data stochastic control system, satisfies a safety specification with
probability above a given threshold. The proposed synthesis method alternates
between two steps: generation of a candidate controller pc, and verification of
the candidate. pc is found by maximizing a Monte Carlo estimate of the safety
probability, and by using a non-validated ODE solver for simulating the system.
Such a candidate is therefore sub-optimal but can be generated very rapidly. To
rule out unstable candidate controllers, we prove and utilize Lyapunov's
indirect method for instability of sampled-data nonlinear systems. In the
subsequent verification step, we use a validated solver based on SMT
(Satisfiability Modulo Theories) to compute a numerically and statistically
valid confidence interval for the safety probability of pc. If the probability
so obtained is not above the threshold, we expand the search space for
candidates by increasing the controller degree. We evaluate our technique on
three case studies: an artificial pancreas model, a powertrain control model,
and a quadruple-tank process.Comment: 12 pages, 4 figures, 4 table
Proximal Backpropagation
We propose proximal backpropagation (ProxProp) as a novel algorithm that
takes implicit instead of explicit gradient steps to update the network
parameters during neural network training. Our algorithm is motivated by the
step size limitation of explicit gradient descent, which poses an impediment
for optimization. ProxProp is developed from a general point of view on the
backpropagation algorithm, currently the most common technique to train neural
networks via stochastic gradient descent and variants thereof. Specifically, we
show that backpropagation of a prediction error is equivalent to sequential
gradient descent steps on a quadratic penalty energy, which comprises the
network activations as variables of the optimization. We further analyze
theoretical properties of ProxProp and in particular prove that the algorithm
yields a descent direction in parameter space and can therefore be combined
with a wide variety of convergent algorithms. Finally, we devise an efficient
numerical implementation that integrates well with popular deep learning
frameworks. We conclude by demonstrating promising numerical results and show
that ProxProp can be effectively combined with common first order optimizers
such as Adam.Comment: Published as a conference paper at ICLR 201
- …