2 research outputs found
SI-ADMM: A Stochastic Inexact ADMM Framework for Stochastic Convex Programs
We consider the structured stochastic convex program requiring the
minimization of
subject to the constraint . Motivated by the need for
decentralized schemes and structure, we propose a stochastic inexact ADMM
(SI-ADMM) framework where subproblems are solved inexactly via stochastic
approximation schemes. Based on this framework, we prove the following: (i)
under suitable assumptions on the associated batch-size of samples utilized at
each iteration, the SI-ADMM scheme produces a sequence that converges to the
unique solution almost surely; (ii) If the number of gradient steps (or
equivalently, the number of sampled gradients) utilized for solving the
subproblems in each iteration increases at a geometric rate, the mean-squared
error diminishes to zero at a prescribed geometric rate; (iii) The overall
iteration complexity in terms of gradient steps (or equivalently samples) is
found to be consistent with the canonical level of .
Preliminary applications on LASSO and distributed regression suggest that the
scheme performs well compared to its competitors.Comment: 37 pages, 2 figures, 3 table
An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path
In this paper, we consider a modified version of the control problem in a
model free Markov decision process (MDP) setting with large state and action
spaces. The control problem most commonly addressed in the contemporary
literature is to find an optimal policy which maximizes the value function,
i.e., the long run discounted reward of the MDP. The current settings also
assume access to a generative model of the MDP with the hidden premise that
observations of the system behaviour in the form of sample trajectories can be
obtained with ease from the model. In this paper, we consider a modified
version, where the cost function is the expectation of a non-convex function of
the value function without access to the generative model. Rather, we assume
that a sample trajectory generated using a priori chosen behaviour policy is
made available. In this restricted setting, we solve the modified control
problem in its true sense, i.e., to find the best possible policy given this
limited information. We propose a stochastic approximation algorithm based on
the well-known cross entropy method which is data (sample trajectory)
efficient, stable, robust as well as computationally and storage efficient. We
provide a proof of convergence of our algorithm to a policy which is globally
optimal relative to the behaviour policy. We also present experimental results
to corroborate our claims and we demonstrate the superiority of the solution
produced by our algorithm compared to the state-of-the-art algorithms under
appropriately chosen behaviour policy