57,738 research outputs found
Overviews of Optimization Techniques for Geometric Estimation
We summarize techniques for optimal geometric estimation from noisy observations for computer
vision applications. We first discuss the interpretation of optimality and point out that geometric
estimation is different from the standard statistical estimation. We also describe our noise
modeling and a theoretical accuracy limit called the KCR lower bound. Then, we formulate estimation
techniques based on minimization of a given cost function: least squares (LS), maximum
likelihood (ML), which includes reprojection error minimization as a special case, and Sampson
error minimization. We describe bundle adjustment and the FNS scheme for numerically solving
them and the hyperaccurate correction that improves the accuracy of ML. Next, we formulate
estimation techniques not based on minimization of any cost function: iterative reweight, renormalization,
and hyper-renormalization. Finally, we show numerical examples to demonstrate that
hyper-renormalization has higher accuracy than ML, which has widely been regarded as the most
accurate method of all. We conclude that hyper-renormalization is robust to noise and currently is
the best method
Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis
Database theory and database practice are typically the domain of computer
scientists who adopt what may be termed an algorithmic perspective on their
data. This perspective is very different than the more statistical perspective
adopted by statisticians, scientific computers, machine learners, and other who
work on what may be broadly termed statistical data analysis. In this article,
I will address fundamental aspects of this algorithmic-statistical disconnect,
with an eye to bridging the gap between these two very different approaches. A
concept that lies at the heart of this disconnect is that of statistical
regularization, a notion that has to do with how robust is the output of an
algorithm to the noise properties of the input data. Although it is nearly
completely absent from computer science, which historically has taken the input
data as given and modeled algorithms discretely, regularization in one form or
another is central to nearly every application domain that applies algorithms
to noisy data. By using several case studies, I will illustrate, both
theoretically and empirically, the nonobvious fact that approximate
computation, in and of itself, can implicitly lead to statistical
regularization. This and other recent work suggests that, by exploiting in a
more principled way the statistical properties implicit in worst-case
algorithms, one can in many cases satisfy the bicriteria of having algorithms
that are scalable to very large-scale databases and that also have good
inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles
of Database Systems (PODS 2012
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
We present a canonical way to turn any smooth parametric family of
probability distributions on an arbitrary search space into a
continuous-time black-box optimization method on , the
\emph{information-geometric optimization} (IGO) method. Invariance as a design
principle minimizes the number of arbitrary choices. The resulting \emph{IGO
flow} conducts the natural gradient ascent of an adaptive, time-dependent,
quantile-based transformation of the objective function. It makes no
assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way
to derive new ones. The cross-entropy method is recovered in a particular case,
and can be extended into a smoothed, parametrization-independent maximum
likelihood update (IGO-ML). For Gaussian distributions on , IGO
is related to natural evolution strategies (NES) and recovers a version of the
CMA-ES algorithm. For Bernoulli distributions on , we recover the
PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm
for optimization on . All these algorithms are unified under a
single information-geometric optimization framework.
Thanks to its intrinsic formulation, the IGO method achieves invariance under
reparametrization of the search space , under a change of parameters of the
probability distributions, and under increasing transformations of the
objective function.
Theory strongly suggests that IGO algorithms have minimal loss in diversity
during optimization, provided the initial diversity is high. First experiments
using restricted Boltzmann machines confirm this insight. Thus IGO seems to
provide, from information theory, an elegant way to spontaneously explore
several valleys of a fitness landscape in a single run.Comment: Final published versio
Distributed Robust Learning
We propose a framework for distributed robust statistical learning on {\em
big contaminated data}. The Distributed Robust Learning (DRL) framework can
reduce the computational time of traditional robust learning methods by several
orders of magnitude. We analyze the robustness property of DRL, showing that
DRL not only preserves the robustness of the base robust learning method, but
also tolerates contaminations on a constant fraction of results from computing
nodes (node failures). More precisely, even in presence of the most adversarial
outlier distribution over computing nodes, DRL still achieves a breakdown point
of at least , where is the break down point of
corresponding centralized algorithm. This is in stark contrast with naive
division-and-averaging implementation, which may reduce the breakdown point by
a factor of when computing nodes are used. We then specialize the
DRL framework for two concrete cases: distributed robust principal component
analysis and distributed robust regression. We demonstrate the efficiency and
the robustness advantages of DRL through comprehensive simulations and
predicting image tags on a large-scale image set.Comment: 18 pages, 2 figure
Implicit Langevin Algorithms for Sampling From Log-concave Densities
For sampling from a log-concave density, we study implicit integrators
resulting from -method discretization of the overdamped Langevin
diffusion stochastic differential equation. Theoretical and algorithmic
properties of the resulting sampling methods for and a
range of step sizes are established. Our results generalize and extend prior
works in several directions. In particular, for , we prove
geometric ergodicity and stability of the resulting methods for all step sizes.
We show that obtaining subsequent samples amounts to solving a strongly-convex
optimization problem, which is readily achievable using one of numerous
existing methods. Numerical examples supporting our theoretical analysis are
also presented
Objective Improvement in Information-Geometric Optimization
Information-Geometric Optimization (IGO) is a unified framework of stochastic
algorithms for optimization problems. Given a family of probability
distributions, IGO turns the original optimization problem into a new
maximization problem on the parameter space of the probability distributions.
IGO updates the parameter of the probability distribution along the natural
gradient, taken with respect to the Fisher metric on the parameter manifold,
aiming at maximizing an adaptive transform of the objective function. IGO
recovers several known algorithms as particular instances: for the family of
Bernoulli distributions IGO recovers PBIL, for the family of Gaussian
distributions the pure rank-mu CMA-ES update is recovered, and for exponential
families in expectation parametrization the cross-entropy/ML method is
recovered. This article provides a theoretical justification for the IGO
framework, by proving that any step size not greater than 1 guarantees monotone
improvement over the course of optimization, in terms of q-quantile values of
the objective function f. The range of admissible step sizes is independent of
f and its domain. We extend the result to cover the case of different step
sizes for blocks of the parameters in the IGO algorithm. Moreover, we prove
that expected fitness improves over time when fitness-proportional selection is
applied, in which case the RPP algorithm is recovered
- …