6,503 research outputs found
From Averaging to Acceleration, There is Only a Step-size
We show that accelerated gradient descent, averaged gradient descent and the
heavy-ball method for non-strongly-convex problems may be reformulated as
constant parameter second-order difference equation algorithms, where stability
of the system is equivalent to convergence at rate O(1/n 2), where n is the
number of iterations. We provide a detailed analysis of the eigenvalues of the
corresponding linear dynamical system , showing various oscillatory and
non-oscillatory behaviors, together with a sharp stability result with explicit
constants. We also consider the situation where noisy gradients are available,
where we extend our general convergence result, which suggests an alternative
algorithm (i.e., with different step sizes) that exhibits the good aspects of
both averaging and acceleration
Local Component Analysis
Kernel density estimation, a.k.a. Parzen windows, is a popular density
estimation method, which can be used for outlier detection or clustering. With
multivariate data, its performance is heavily reliant on the metric used within
the kernel. Most earlier work has focused on learning only the bandwidth of the
kernel (i.e., a scalar multiplicative factor). In this paper, we propose to
learn a full Euclidean metric through an expectation-minimization (EM)
procedure, which can be seen as an unsupervised counterpart to neighbourhood
component analysis (NCA). In order to avoid overfitting with a fully
nonparametric density estimator in high dimensions, we also consider a
semi-parametric Gaussian-Parzen density model, where some of the variables are
modelled through a jointly Gaussian density, while others are modelled through
Parzen windows. For these two models, EM leads to simple closed-form updates
based on matrix inversions and eigenvalue decompositions. We show empirically
that our method leads to density estimators with higher test-likelihoods than
natural competing methods, and that the metrics may be used within most
unsupervised learning techniques that rely on such metrics, such as spectral
clustering or manifold learning methods. Finally, we present a stochastic
approximation scheme which allows for the use of this method in a large-scale
setting
Pricing in networks
This paper studies optimal pricing in networks in the presence of local consumption or price externalities. It analyzes the relation between prices and nodal centrality measures. Using an asymptotic approach, it shows that the ranking of optimal prices and strategies can be reduced to the lexicographic ranking of a specific vector of nodal characteristics. In particular, this result shows that with positive consumption externalities, prices are higher at nodes with higher degree, and with relative price externalities, prices are higher at nodes which have more neighbors of smaller degree.Social Networks, Network Externalities, Oligopolies
Optimal Assignment of Durable Objects to Successive Agents
This paper analyzes the assignment of durable objects to successive generations of agents who live for two periods. The optimal assignment rule is stationary, favors old agents and is determined by a selectivity function which satisfies an iterative functional differential equation. More patient social planners are more selective, as are social planners facing distributions of types with higher probabilities for higher types. The paper also characterizes optimal assignment rules when monetary transfers are allowed and agents face a recovery cost, when agents' types are private information and when agents can invest to improve their types.Dynamic Assignment ; Durable Objects ; Revenue Management ; Dynamic Mechanism Design ; Overlapping Generations ; Promotions and Intertemporal Assignments
Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
We consider the optimization of a quadratic objective function whose
gradients are only accessible through a stochastic oracle that returns the
gradient at any given point plus a zero-mean finite variance random error. We
present the first algorithm that achieves jointly the optimal prediction error
rates for least-squares regression, both in terms of forgetting of initial
conditions in O(1/n 2), and in terms of dependence on the noise and dimension d
of the problem, as O(d/n). Our new algorithm is based on averaged accelerated
regularized gradient descent, and may also be analyzed through finer
assumptions on initial conditions and the Hessian matrix, leading to
dimension-free quantities that may still be small while the " optimal " terms
above are large. In order to characterize the tightness of these new bounds, we
consider an application to non-parametric regression and use the known lower
bounds on the statistical performance (without computational limits), which
happen to match our bounds obtained from a single pass on the data and thus
show optimality of our algorithm in a wide variety of particular trade-offs
between bias and variance
Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization
We consider the problem of optimizing the sum of a smooth convex function and
a non-smooth convex function using proximal-gradient methods, where an error is
present in the calculation of the gradient of the smooth term or in the
proximity operator with respect to the non-smooth term. We show that both the
basic proximal-gradient method and the accelerated proximal-gradient method
achieve the same convergence rate as in the error-free case, provided that the
errors decrease at appropriate rates.Using these rates, we perform as well as
or better than a carefully chosen fixed error level on a set of structured
sparsity problems.Comment: Neural Information Processing Systems (2011
Minimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing the
sum of a finite number of smooth convex functions. Like stochastic gradient
(SG) methods, the SAG method's iteration cost is independent of the number of
terms in the sum. However, by incorporating a memory of previous gradient
values the SAG method achieves a faster convergence rate than black-box SG
methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in
general, and when the sum is strongly-convex the convergence rate is improved
from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for
p \textless{} 1. Further, in many cases the convergence rate of the new method
is also faster than black-box deterministic gradient methods, in terms of the
number of gradient evaluations. Numerical experiments indicate that the new
algorithm often dramatically outperforms existing SG and deterministic gradient
methods, and that the performance may be further improved through the use of
non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated
literature follow and discussion of subsequent work, additional Lemma showing
the validity of one of the formulas, somewhat simplified presentation of
Lyapunov bound, included code needed for checking proofs rather than the
polynomials generated by the code, added error regions to the numerical
experiment
Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Our goal is to improve variance reducing stochastic methods through better
control variates. We first propose a modification of SVRG which uses the
Hessian to track gradients over time, rather than to recondition, increasing
the correlation of the control variates and leading to faster theoretical
convergence close to the optimum. We then propose accurate and computationally
efficient approximations to the Hessian, both using a diagonal and a low-rank
matrix. Finally, we demonstrate the effectiveness of our method on a wide range
of problems.Comment: 17 pages, 2 figures, 1 tabl
- …