47 research outputs found
Projecting Ising Model Parameters for Fast Mixing
Inference in general Ising models is difficult, due to high treewidth making
tree-based algorithms intractable. Moreover, when interactions are strong,
Gibbs sampling may take exponential time to converge to the stationary
distribution. We present an algorithm to project Ising model parameters onto a
parameter set that is guaranteed to be fast mixing, under several divergences.
We find that Gibbs sampling using the projected parameters is more accurate
than with the original parameters when interaction strengths are strong and
when limited time is available for sampling.Comment: Advances in Neural Information Processing Systems 201
Discussion
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/106905/1/insr12033.pd
Robust Data-Driven Accelerated Mirror Descent
Learning-to-optimize is an emerging framework that leverages training data to
speed up the solution of certain optimization problems. One such approach is
based on the classical mirror descent algorithm, where the mirror map is
modelled using input-convex neural networks. In this work, we extend this
functional parameterization approach by introducing momentum into the
iterations, based on the classical accelerated mirror descent. Our approach
combines short-time accelerated convergence with stable long-time behavior. We
empirically demonstrate additional robustness with respect to multiple
parameters on denoising and deconvolution experiments.Comment: Note inconsistency with ICASSP paper for step-size choice in (4c) and
associated Alg. 1, this version is correct with step-size kt/
Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization
Stochastic optimization naturally arises in machine learning. Efficient
algorithms with provable guarantees, however, are still largely missing, when
the objective function is nonconvex and the data points are dependent. This
paper studies this fundamental challenge through a streaming PCA problem for
stationary time series data. Specifically, our goal is to estimate the
principle component of time series data with respect to the covariance matrix
of the stationary distribution. Computationally, we propose a variant of Oja's
algorithm combined with downsampling to control the bias of the stochastic
gradient caused by the data dependency. Theoretically, we quantify the
uncertainty of our proposed stochastic algorithm based on diffusion
approximations. This allows us to prove the asymptotic rate of convergence and
further implies near optimal asymptotic sample complexity. Numerical
experiments are provided to support our analysis
Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise
In this work, we study the concentration behavior of a stochastic
approximation (SA) algorithm under a contractive operator with respect to an
arbitrary norm. We consider two settings where the iterates are potentially
unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian
noise. We obtain maximal concentration inequalities on the convergence errors,
and show that these errors have sub-Gaussian tails in the additive noise
setting, and super-polynomial tails (faster than polynomial decay) in the
multiplicative noise setting. In addition, we provide an impossibility result
showing that it is in general not possible to achieve sub-exponential tails for
SA with multiplicative noise. To establish these results, we develop a novel
bootstrapping argument that involves bounding the moment generating function of
the generalized Moreau envelope of the error and the construction of an
exponential supermartingale to enable using Ville's maximal inequality.
To demonstrate the applicability of our theoretical results, we use them to
provide maximal concentration bounds for a large class of reinforcement
learning algorithms, including but not limited to on-policy TD-learning with
linear function approximation, off-policy TD-learning with generalized
importance sampling factors, and -learning. To the best of our knowledge,
super-polynomial concentration bounds for off-policy TD-learning have not been
established in the literature due to the challenge of handling the combination
of unbounded iterates and multiplicative noise