4,047 research outputs found
Second-Order Stochastic Optimization for Machine Learning in Linear Time
First-order stochastic methods are the state-of-the-art in large-scale
machine learning optimization owing to efficient per-iteration complexity.
Second-order methods, while able to provide faster convergence, have been much
less explored due to the high cost of computing the second-order information.
In this paper we develop second-order stochastic methods for optimization
problems in machine learning that match the per-iteration cost of gradient
based methods, and in certain settings improve upon the overall running time
over popular first-order methods. Furthermore, our algorithm has the desirable
property of being implementable in time linear in the sparsity of the input
data
A deep cut ellipsoid algorithm for convex programming
This paper proposes a deep cut version of the ellipsoid algorithm for solving a general class of continuous convex programming problems. In each step the algorithm does not require more computational effort to construct these deep cuts than its corresponding central cut version. Rules that prevent some of the numerical instabilities and theoretical drawbacks usually associated with the algorithm are also provided. Moreover, for a large class of convex programs a simple proof of its rate of convergence is given and the relation with previously known results is discussed. Finally some computational results of the deep and central cut version of the algorithm applied to a min—max stochastic queue location problem are reported.location theory;convex programming;deep cut ellipsoid algorithm;min—max programming;rate of convergence
Minimax and Adaptive Inference in Nonparametric Function Estimation
Since Stein's 1956 seminal paper, shrinkage has played a fundamental role in
both parametric and nonparametric inference. This article discusses minimaxity
and adaptive minimaxity in nonparametric function estimation. Three
interrelated problems, function estimation under global integrated squared
error, estimation under pointwise squared error, and nonparametric confidence
intervals, are considered. Shrinkage is pivotal in the development of both the
minimax theory and the adaptation theory. While the three problems are closely
connected and the minimax theories bear some similarities, the adaptation
theories are strikingly different. For example, in a sharp contrast to adaptive
point estimation, in many common settings there do not exist nonparametric
confidence intervals that adapt to the unknown smoothness of the underlying
function. A concise account of these theories is given. The connections as well
as differences among these problems are discussed and illustrated through
examples.Comment: Published in at http://dx.doi.org/10.1214/11-STS355 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A joint time-invariant filtering approach to the linear Gaussian relay problem
In this paper, the linear Gaussian relay problem is considered. Under the
linear time-invariant (LTI) model the problem is formulated in the frequency
domain based on the Toeplitz distribution theorem. Under the further assumption
of realizable input spectra, the LTI Gaussian relay problem is converted to a
joint design problem of source and relay filters under two power constraints,
one at the source and the other at the relay, and a practical solution to this
problem is proposed based on the projected subgradient method. Numerical
results show that the proposed method yields a noticeable gain over the
instantaneous amplify-and-forward (AF) scheme in inter-symbol interference
(ISI) channels. Also, the optimality of the AF scheme within the class of
one-tap relay filters is established in flat-fading channels.Comment: 30 pages, 10 figure
The CMA Evolution Strategy: A Tutorial
This tutorial introduces the CMA Evolution Strategy (ES), where CMA stands
for Covariance Matrix Adaptation. The CMA-ES is a stochastic, or randomized,
method for real-parameter (continuous domain) optimization of non-linear,
non-convex functions. We try to motivate and derive the algorithm from
intuitive concepts and from requirements of non-linear, non-convex search in
continuous domain.Comment: ArXiv e-prints, arXiv:1604.xxxx
Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions
Submodular function minimization is a fundamental optimization problem that
arises in several applications in machine learning and computer vision. The
problem is known to be solvable in polynomial time, but general purpose
algorithms have high running times and are unsuitable for large-scale problems.
Recent work have used convex optimization techniques to obtain very practical
algorithms for minimizing functions that are sums of ``simple" functions. In
this paper, we use random coordinate descent methods to obtain algorithms with
faster linear convergence rates and cheaper iteration costs. Compared to
alternating projection methods, our algorithms do not rely on full-dimensional
vector operations and they converge in significantly fewer iterations
- …