40,000 research outputs found
Online Local Learning via Semidefinite Programming
In many online learning problems we are interested in predicting local
information about some universe of items. For example, we may want to know
whether two items are in the same cluster rather than computing an assignment
of items to clusters; we may want to know which of two teams will win a game
rather than computing a ranking of teams. Although finding the optimal
clustering or ranking is typically intractable, it may be possible to predict
the relationships between items as well as if you could solve the global
optimization problem exactly.
Formally, we consider an online learning problem in which a learner
repeatedly guesses a pair of labels (l(x), l(y)) and receives an adversarial
payoff depending on those labels. The learner's goal is to receive a payoff
nearly as good as the best fixed labeling of the items. We show that a simple
algorithm based on semidefinite programming can obtain asymptotically optimal
regret in the case where the number of possible labels is O(1), resolving an
open problem posed by Hazan, Kale, and Shalev-Schwartz. Our main technical
contribution is a novel use and analysis of the log determinant regularizer,
exploiting the observation that log det(A + I) upper bounds the entropy of any
distribution with covariance matrix A.Comment: 10 page
Estimation of the Rate-Distortion Function
Motivated by questions in lossy data compression and by theoretical
considerations, we examine the problem of estimating the rate-distortion
function of an unknown (not necessarily discrete-valued) source from empirical
data. Our focus is the behavior of the so-called "plug-in" estimator, which is
simply the rate-distortion function of the empirical distribution of the
observed data. Sufficient conditions are given for its consistency, and
examples are provided to demonstrate that in certain cases it fails to converge
to the true rate-distortion function. The analysis of its performance is
complicated by the fact that the rate-distortion function is not continuous in
the source distribution; the underlying mathematical problem is closely related
to the classical problem of establishing the consistency of maximum likelihood
estimators. General consistency results are given for the plug-in estimator
applied to a broad class of sources, including all stationary and ergodic ones.
A more general class of estimation problems is also considered, arising in the
context of lossy data compression when the allowed class of coding
distributions is restricted; analogous results are developed for the plug-in
estimator in that case. Finally, consistency theorems are formulated for
modified (e.g., penalized) versions of the plug-in, and for estimating the
optimal reproduction distribution.Comment: 18 pages, no figures [v2: removed an example with an error; corrected
typos; a shortened version will appear in IEEE Trans. Inform. Theory
An informational approach to the global optimization of expensive-to-evaluate functions
In many global optimization problems motivated by engineering applications,
the number of function evaluations is severely limited by time or cost. To
ensure that each evaluation contributes to the localization of good candidates
for the role of global minimizer, a sequential choice of evaluation points is
usually carried out. In particular, when Kriging is used to interpolate past
evaluations, the uncertainty associated with the lack of information on the
function can be expressed and used to compute a number of criteria accounting
for the interest of an additional evaluation at any given point. This paper
introduces minimizer entropy as a new Kriging-based criterion for the
sequential choice of points at which the function should be evaluated. Based on
\emph{stepwise uncertainty reduction}, it accounts for the informational gain
on the minimizer expected from a new evaluation. The criterion is approximated
using conditional simulations of the Gaussian process model behind Kriging, and
then inserted into an algorithm similar in spirit to the \emph{Efficient Global
Optimization} (EGO) algorithm. An empirical comparison is carried out between
our criterion and \emph{expected improvement}, one of the reference criteria in
the literature. Experimental results indicate major evaluation savings over
EGO. Finally, the method, which we call IAGO (for Informational Approach to
Global Optimization) is extended to robust optimization problems, where both
the factors to be tuned and the function evaluations are corrupted by noise.Comment: Accepted for publication in the Journal of Global Optimization (This
is the revised version, with additional details on computational problems,
and some grammatical changes
A variational approach to path estimation and parameter inference of hidden diffusion processes
We consider a hidden Markov model, where the signal process, given by a
diffusion, is only indirectly observed through some noisy measurements. The
article develops a variational method for approximating the hidden states of
the signal process given the full set of observations. This, in particular,
leads to systematic approximations of the smoothing densities of the signal
process. The paper then demonstrates how an efficient inference scheme, based
on this variational approach to the approximation of the hidden states, can be
designed to estimate the unknown parameters of stochastic differential
equations. Two examples at the end illustrate the efficacy and the accuracy of
the presented method.Comment: 37 pages, 2 figures, revise
- …