Search CORE

31 research outputs found

Shifting Regret, Mirror Descent, and Matrices

Author: Gyorgy A
Szepesvari C
Publication venue: Journal of Machine Learning Research
Publication date: 24/04/2016
Field of study

We consider the problem of online prediction in changing environments. In this framework the performance of a predictor is evaluated as the loss relative to an arbitrarily changing predictor, whose individual components come from a base class of predictors. Typical results in the literature consider different base classes (experts, linear predictors on the simplex, etc.) separately. Introducing an arbitrary mapping inside the mirror decent algorithm, we provide a framework that unifies and extends existing results. As an example, we prove new shifting regret bounds for matrix prediction problems

Spiral - Imperial College Digital Repository

Classification with Margin Constraints: A Unification with Applications to Optimization

Author: Gyorgy A
Joulani P
Szepesvari C
Publication venue
Publication date: 02/11/2015
Field of study

This paper introduces Classification with Margin Constraints (CMC), a simple generalization of cost-sensitive classification that unifies several learning settings. In particular, we show that a CMC classifier can be used, out of the box, to solve regression, quantile estimation, and several anomaly detection formulations. On the one hand, our reductions to CMC are at the loss level: the optimization problem to solve under the equivalent CMC setting is exactly the same as the optimization problem under the original (e.g. regression) setting. On the other hand, due to the close relationship between CMC and standard binary classification, the ideas proposed for efficient optimization in binary classification naturally extend to CMC. As such, any improvement in CMC optimization immediately transfers to the domains reduced to CMC, without the need for new derivations or programs. To our knowledge, this unified view has been overlooked by the existing practice in the literature, where an optimization technique (such as SMO or PEGASOS) is first developed for binary classification and then extended to other problem domains on a case-by-case basis. We demonstrate the flexibility of CMC by reducing two recent anomaly detection and quantile learning methods to CMC

Spiral - Imperial College Digital Repository

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

Author: Gyorgy A
Hu X
Prashanth LA
Szepesvari C
Publication venue
Publication date: 02/11/2015
Field of study

For bandit convex optimization we propose a model, where a gradient estimation oracle acts as an intermediary between a noisy function evaluation oracle and the algorithms. The algorithms can control the bias-variance tradeoff in the gradient estimates. We prove lower and upper bounds for the minimax error of algorithms that interact with the objective function by controlling this oracle. The upper bounds replicate many existing results (capturing the essence of existing proofs) while the lower bounds put a limit on the achievable performance in this setup. In particular, our results imply that no algorithm can achieve the optimal minimax error rate in stochastic bandit smooth convex optimization

Spiral - Imperial College Digital Repository

SDP Relaxation with Randomized Rounding for Energy Disaggregation

Author: Gyorgy A
Shaloudegi K
Szepesvari C
Xu W
Publication venue: Neutral Information Processing Systems Foundation, Inc.
Publication date: 12/08/2016
Field of study

We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each appliance over time based on the total energy-consumption signal of a household. The current state of the art is to model the problem as inference in factorial HMMs, and use quadratic programming to find an approximate solution to the resulting quadratic integer program. Here we take a more principled approach, better suited to integer programming problems, and find an approximate optimum by combining convex semidefinite relaxations randomized rounding, as well as a scalable ADMM method that exploits the special structure of the resulting semidefinite program. Simulation results both in synthetic and real-world datasets demonstrate the superiority of our method

Spiral - Imperial College Digital Repository

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

Author: Gyorgy A
Huang R
Lattimore T
Szepesvari C
Publication venue: Neutral Information Processing Systems Foundation, Inc.
Publication date: 12/08/2016
Field of study

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other “lucky” settings when FTL achieves sublinear, “small” regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

SDP Relaxation with Randomized Rounding for Energy Disaggregation

Author: Gyorgy A
Shaloudegi K
Szepesvari C
Xu W
Publication venue: Neutral Information Processing Systems Foundation, Inc.
Publication date: 12/08/2016
Field of study

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

Author: Gyorgy A
Huang R
Lattimore T
Szepesvari C
Publication venue: Neutral Information Processing Systems Foundation, Inc.
Publication date: 12/08/2016
Field of study

Spiral - Imperial College Digital Repository

Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities

Author: Gyorgy A
Huang R
Lattimore T
Szepesvari C
Publication venue: Microtome Publishing
Publication date: 24/11/2017
Field of study

Follow the leader (FTL) is a simple online learning algorithm that is known to perform well when the loss functions are convex and positively curved. In this paper we ask whether there are other settings when FTL achieves low regret. In particular, we study the fundamental problem of linear prediction over a convex, compact domain with non-empty interior. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys logarithmic regret, while for polytope domains and stochastic data it enjoys finite expected regret. The former result is also extended to strongly convex domains by establishing an equivalence between the strong convexity of sets and the minimum curvature of their boundary, which may be of independent interest. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the smaller regret of FTL when the data is ‘easy’. Finally, we show that such guarantees are achievable directly (e.g., by the follow the regularized leader algorithm or by a shrinkage-based variant of FTL) when the constraint set is an ellipsoid

Spiral - Imperial College Digital Repository

Approximate policy iteration: A survey and some new methods

Author: A. G. Barto
A. G. Barto
A. Gosavi
A. L. Samuel
A. L. Samuel
A. Nedić
B. Martinet
B. Roy Van
B. Roy Van
C. A. J. Fletcher
C. Szepesvari
C. Thiery
D. D. Castro
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. S. Choi
D. White
Dimitri P. Bertsekas
E. V. Denardo
F. L. Lewis
F. L. Lewis
F. Pineda
G. J. Gordon
G. J. Tesauro
G. Strang
H. Chang
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
I. Menache
I. Szita
J. A. Boyan
J. Liu
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
L. Busoniu
L. Busoniu
L. Busoniu
L. C. Baird
L. Gurvits
L. N. Trefethen
L. S. Shapley
M. A. Krasnoselskii
M. G. Lagoudakis
M. L. Puterman
M. Wang
N. Polydorides
P. J. Werbos
P. J. Werbös
P. T. Boer de
R. J. Williams
R. S. Sutton
R. S. Sutton
R. T. Rockafellar
R. Y. Rubinstein
S. J. Bradtke
S. Meyn
S. P. Singh
T. Jaakkola
T. Jung
V. F. Farias
V. S. Borkar
W. B. Powell
X. R. Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.National Science Foundation (U.S.) (No.ECCS-0801549)Los Alamos National Laboratory. Information Science and Technology InstituteUnited States. Air Force (No.FA9550-10-1-0412

CiteSeerX

DSpace@MIT

Crossref

Institute of Mathematics AS CR, v. v. i.

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

Author: Gyorgy A
Joulani P
Szepesvari C
Publication venue
Publication date: 24/07/2017
Field of study

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms. In this paper we contribute to this effort in two ways: First, based on a new regret decomposition and a generalization of Bregman divergences, we provide a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror Descent (MD) and the Follow-the-Regularized-Leader (FTRL) algorithms. The analysis is done with extra care so as not to introduce assumptions not needed in the proofs and allows to combine, in a straightforward way, different algorithmic ideas (e.g., adaptivity, optimism, implicit updates) and learning settings (e.g., strongly convex or composite objectives). This way we are able to reprove, extend and refine a large body of the literature, while keeping the proofs concise. The second contribution is a byproduct of this careful analysis: We present algorithms with improved variational bounds for smooth, composite objectives, including a new family of optimistic MD algorithms with only one projection step per round. Furthermore, we provide a simple extension of adaptive regret bounds to practically relevant non-convex problem settings with essentially no extra effort.Comment: Accepted to The 28th International Conference on Algorithmic Learning Theory (ALT 2017). 40 page

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository