182 research outputs found
A Unified View of Large-scale Zero-sum Equilibrium Computation
The task of computing approximate Nash equilibria in large zero-sum
extensive-form games has received a tremendous amount of attention due mainly
to the Annual Computer Poker Competition. Immediately after its inception, two
competing and seemingly different approaches emerged---one an application of
no-regret online learning, the other a sophisticated gradient method applied to
a convex-concave saddle-point formulation. Since then, both approaches have
grown in relative isolation with advancements on one side not effecting the
other. In this paper, we rectify this by dissecting and, in a sense, unify the
two views.Comment: AAAI Workshop on Computer Poker and Imperfect Informatio
Proximal Gradient methods with Adaptive Subspace Sampling
Many applications in machine learning or signal processing involve nonsmooth
optimization problems. This nonsmoothness brings a low-dimensional structure to
the optimal solutions. In this paper, we propose a randomized proximal gradient
method harnessing this underlying structure. We introduce two key components:
i) a random subspace proximal gradient algorithm; ii) an identification-based
sampling of the subspaces. Their interplay brings a significant performance
improvement on typical learning problems in terms of dimensions explored
Low-rank approximate inverse for preconditioning tensor-structured linear systems
In this paper, we propose an algorithm for the construction of low-rank
approximations of the inverse of an operator given in low-rank tensor format.
The construction relies on an updated greedy algorithm for the minimization of
a suitable distance to the inverse operator. It provides a sequence of
approximations that are defined as the projections of the inverse operator in
an increasing sequence of linear subspaces of operators. These subspaces are
obtained by the tensorization of bases of operators that are constructed from
successive rank-one corrections. In order to handle high-order tensors,
approximate projections are computed in low-rank Hierarchical Tucker subsets of
the successive subspaces of operators. Some desired properties such as symmetry
or sparsity can be imposed on the approximate inverse operator during the
correction step, where an optimal rank-one correction is searched as the tensor
product of operators with the desired properties. Numerical examples illustrate
the ability of this algorithm to provide efficient preconditioners for linear
systems in tensor format that improve the convergence of iterative solvers and
also the quality of the resulting low-rank approximations of the solution
An Efficient Primal-Dual Prox Method for Non-Smooth Optimization
We study the non-smooth optimization problems in machine learning, where both
the loss function and the regularizer are non-smooth functions. Previous
studies on efficient empirical loss minimization assume either a smooth loss
function or a strongly convex regularizer, making them unsuitable for
non-smooth optimization. We develop a simple yet efficient method for a family
of non-smooth optimization problems where the dual form of the loss function is
bilinear in primal and dual variables. We cast a non-smooth optimization
problem into a minimax optimization problem, and develop a primal dual prox
method that solves the minimax optimization problem at a rate of
{assuming that the proximal step can be efficiently solved}, significantly
faster than a standard subgradient descent method that has an
convergence rate. Our empirical study verifies the efficiency of the proposed
method for various non-smooth optimization problems that arise ubiquitously in
machine learning by comparing it to the state-of-the-art first order methods
Highly-Smooth Zero-th Order Online Optimization Vianney Perchet
The minimization of convex functions which are only available through partial
and noisy information is a key methodological problem in many disciplines. In
this paper we consider convex optimization with noisy zero-th order
information, that is noisy function evaluations at any desired point. We focus
on problems with high degrees of smoothness, such as logistic regression. We
show that as opposed to gradient-based algorithms, high-order smoothness may be
used to improve estimation rates, with a precise dependence of our upper-bounds
on the degree of smoothness. In particular, we show that for infinitely
differentiable functions, we recover the same dependence on sample size as
gradient-based algorithms, with an extra dimension-dependent factor. This is
done for both convex and strongly-convex functions, with finite horizon and
anytime algorithms. Finally, we also recover similar results in the online
optimization setting.Comment: Conference on Learning Theory (COLT), Jun 2016, New York, United
States. 201
- …