102,732 research outputs found
Stochastic Optimization with Importance Sampling
Uniform sampling of training data has been commonly used in traditional
stochastic optimization algorithms such as Proximal Stochastic Gradient Descent
(prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although
uniform sampling can guarantee that the sampled stochastic quantity is an
unbiased estimate of the corresponding true quantity, the resulting estimator
may have a rather high variance, which negatively affects the convergence of
the underlying optimization procedure. In this paper we study stochastic
optimization with importance sampling, which improves the convergence rate by
reducing the stochastic variance. Specifically, we study prox-SGD (actually,
stochastic mirror descent) with importance sampling and prox-SDCA with
importance sampling. For prox-SGD, instead of adopting uniform sampling
throughout the training process, the proposed algorithm employs importance
sampling to minimize the variance of the stochastic gradient. For prox-SDCA,
the proposed importance sampling scheme aims to achieve higher expected dual
value at each dual coordinate ascent step. We provide extensive theoretical
analysis to show that the convergence rates with the proposed importance
sampling methods can be significantly improved under suitable conditions both
for prox-SGD and for prox-SDCA. Experiments are provided to verify the
theoretical analysis.Comment: 29 page
Online Variance Reduction for Stochastic Optimization
Modern stochastic optimization methods often rely on uniform sampling which
is agnostic to the underlying characteristics of the data. This might degrade
the convergence by yielding estimates that suffer from a high variance. A
possible remedy is to employ non-uniform importance sampling techniques, which
take the structure of the dataset into account. In this work, we investigate a
recently proposed setting which poses variance reduction as an online
optimization problem with bandit feedback. We devise a novel and efficient
algorithm for this setting that finds a sequence of importance sampling
distributions competitive with the best fixed distribution in hindsight, the
first result of this kind. While we present our method for sampling datapoints,
it naturally extends to selecting coordinates or even blocks of thereof.
Empirical validations underline the benefits of our method in several settings.Comment: COLT 201
Trajectory-Based Off-Policy Deep Reinforcement Learning
Policy gradient methods are powerful reinforcement learning algorithms and
have been demonstrated to solve many complex tasks. However, these methods are
also data-inefficient, afflicted with high variance gradient estimates, and
frequently get stuck in local optima. This work addresses these weaknesses by
combining recent improvements in the reuse of off-policy data and exploration
in parameter space with deterministic behavioral policies. The resulting
objective is amenable to standard neural network optimization strategies like
stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo.
Incorporation of previous rollouts via importance sampling greatly improves
data-efficiency, whilst stochastic optimization schemes facilitate the escape
from local optima. We evaluate the proposed approach on a series of continuous
control benchmark tasks. The results show that the proposed algorithm is able
to successfully and reliably learn solutions using fewer system interactions
than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
Reduced Complexity Filtering with Stochastic Dominance Bounds: A Convex Optimization Approach
This paper uses stochastic dominance principles to construct upper and lower
sample path bounds for Hidden Markov Model (HMM) filters. Given a HMM, by using
convex optimization methods for nuclear norm minimization with copositive
constraints, we construct low rank stochastic marices so that the optimal
filters using these matrices provably lower and upper bound (with respect to a
partially ordered set) the true filtered distribution at each time instant.
Since these matrices are low rank (say R), the computational cost of evaluating
the filtering bounds is O(XR) instead of O(X2). A Monte-Carlo importance
sampling filter is presented that exploits these upper and lower bounds to
estimate the optimal posterior. Finally, using the Dobrushin coefficient,
explicit bounds are given on the variational norm between the true posterior
and the upper and lower bounds
- …