Search CORE

6,503 research outputs found

From Averaging to Acceleration, There is Only a Step-size

Author: Bach Francis
Flammarion Nicolas
Publication venue
Publication date: 01/01/2015
Field of study

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system , showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Local Component Analysis

Author: Bach Francis
Roux Nicolas Le
Publication venue
Publication date: 01/01/2011
Field of study

Kernel density estimation, a.k.a. Parzen windows, is a popular density estimation method, which can be used for outlier detection or clustering. With multivariate data, its performance is heavily reliant on the metric used within the kernel. Most earlier work has focused on learning only the bandwidth of the kernel (i.e., a scalar multiplicative factor). In this paper, we propose to learn a full Euclidean metric through an expectation-minimization (EM) procedure, which can be seen as an unsupervised counterpart to neighbourhood component analysis (NCA). In order to avoid overfitting with a fully nonparametric density estimator in high dimensions, we also consider a semi-parametric Gaussian-Parzen density model, where some of the variables are modelled through a jointly Gaussian density, while others are modelled through Parzen windows. For these two models, EM leads to simple closed-form updates based on matrix inversions and eigenvalue decompositions. We show empirically that our method leads to density estimators with higher test-likelihoods than natural competing methods, and that the metrics may be used within most unsupervised learning techniques that rely on such metrics, such as spectral clustering or manifold learning methods. Finally, we present a stochastic approximation scheme which allows for the use of this method in a large-scale setting

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Pricing in networks

Author: Francis Bloch
Nicolas Quérou
Publication venue
Publication date
Field of study

This paper studies optimal pricing in networks in the presence of local consumption or price externalities. It analyzes the relation between prices and nodal centrality measures. Using an asymptotic approach, it shows that the ranking of optimal prices and strategies can be reduced to the lexicographic ranking of a specific vector of nodal characteristics. In particular, this result shows that with positive consumption externalities, prices are higher at nodes with higher degree, and with relative price externalities, prices are higher at nodes which have more neighbors of smaller degree.Social Networks, Network Externalities, Oligopolies

Research Papers in Economics

Optimal Assignment of Durable Objects to Successive Agents

Author: Francis Bloch
Nicolas Houy
Publication venue
Publication date
Field of study

This paper analyzes the assignment of durable objects to successive generations of agents who live for two periods. The optimal assignment rule is stationary, favors old agents and is determined by a selectivity function which satisfies an iterative functional differential equation. More patient social planners are more selective, as are social planners facing distributions of types with higher probabilities for higher types. The paper also characterizes optimal assignment rules when monetary transfers are allowed and agents face a recovery cost, when agents' types are private information and when agents can invest to improve their types.Dynamic Assignment ; Durable Objects ; Revenue Management ; Dynamic Mechanism Design ; Overlapping Generations ; Promotions and Intertemporal Assignments

Research Papers in Economics

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

Author: Bach Francis
Dieuleveut Aymeric
Flammarion Nicolas
Publication venue
Publication date: 23/02/2016
Field of study

We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n 2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the " optimal " terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

Author: Bach Francis
Roux Nicolas Le
Schmidt Mark
Publication venue
Publication date: 01/01/2011
Field of study

We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates.Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.Comment: Neural Information Processing Systems (2011

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Minimizing Finite Sums with the Stochastic Average Gradient

Author: Bach Francis
Roux Nicolas Le
Schmidt Mark
Publication venue
Publication date: 10/05/2016
Field of study

We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated literature follow and discussion of subsequent work, additional Lemma showing the validity of one of the formulas, somewhat simplified presentation of Lyapunov bound, included code needed for checking proofs rather than the polynomials generated by the code, added error regions to the numerical experiment

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

Author: Bach Francis
Gower Robert M.
Roux Nicolas Le
Publication venue
Publication date: 01/01/2018
Field of study

Our goal is to improve variance reducing stochastic methods through better control variates. We first propose a modification of SVRG which uses the Hessian to track gradients over time, rather than to recondition, increasing the correlation of the control variates and leading to faster theoretical convergence close to the optimum. We then propose accurate and computationally efficient approximations to the Hessian, both using a diagonal and a low-rank matrix. Finally, we demonstrate the effectiveness of our method on a wide range of problems.Comment: 17 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Book Notes: The New Commonwealth Model of Constitutionalism: Theory and Practice, by Stephen Gardbaum

Author: Francis Nicolas
Publication venue: Osgoode Digital Commons
Publication date: 01/10/2013
Field of study

York University, Osgoode Hall Law School