812 research outputs found
Entropic Projections and Dominating Points
Generalized entropic projections and dominating points are solutions to
convex minimization problems related to conditional laws of large numbers. They
appear in many areas of applied mathematics such as statistical physics,
information theory, mathematical statistics, ill-posed inverse problems or
large deviation theory. By means of convex conjugate duality and functional
analysis, criteria are derived for their existence. Representations of the
generalized entropic projections are obtained: they are the ``measure
component" of some extended entropy minimization problem.Comment: ESAIM P&S (2011) to appea
Minimization of entropy functionals
Entropy functionals (i.e. convex integral functionals) and extensions of
these functionals are minimized on convex sets. This paper is aimed at reducing
as much as possible the assumptions on the constraint set. Dual equalities and
characterizations of the minimizers are obtained with weak constraint
qualifications
Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities
Integral functionals based on convex normal integrands are minimized subject
to finitely many moment constraints. The integrands are finite on the positive
and infinite on the negative numbers, strictly convex but not necessarily
differentiable. The minimization is viewed as a primal problem and studied
together with a dual one in the framework of convex duality. The effective
domain of the value function is described by a conic core, a modification of
the earlier concept of convex core. Minimizers and generalized minimizers are
explicitly constructed from solutions of modified dual problems, not assuming
the primal constraint qualification. A generalized Pythagorean identity is
presented using Bregman distance and a correction term for lack of essential
smoothness in integrands. Results are applied to minimization of Bregman
distances. Existence of a generalized dual solution is established whenever the
dual value is finite, assuming the dual constraint qualification. Examples of
`irregular' situations are included, pointing to the limitations of generality
of certain key results
Entropy: The Markov Ordering Approach
The focus of this article is on entropy and Markov processes. We study the
properties of functionals which are invariant with respect to monotonic
transformations and analyze two invariant "additivity" properties: (i)
existence of a monotonic transformation which makes the functional additive
with respect to the joining of independent systems and (ii) existence of a
monotonic transformation which makes the functional additive with respect to
the partitioning of the space of states. All Lyapunov functionals for Markov
chains which have properties (i) and (ii) are derived. We describe the most
general ordering of the distribution space, with respect to which all
continuous-time Markov processes are monotonic (the {\em Markov order}). The
solution differs significantly from the ordering given by the inequality of
entropy growth. For inference, this approach results in a convex compact set of
conditionally "most random" distributions.Comment: 50 pages, 4 figures, Postprint version. More detailed discussion of
the various entropy additivity properties and separation of variables for
independent subsystems in MaxEnt problem is added in Section 4.2.
Bibliography is extende
Relative entropy and the multi-variable multi-dimensional moment problem
Entropy-like functionals on operator algebras have been studied since the
pioneering work of von Neumann, Umegaki, Lindblad, and Lieb. The most
well-known are the von Neumann entropy and a
generalization of the Kullback-Leibler distance , refered to as quantum relative entropy and used to quantify
distance between states of a quantum system. The purpose of this paper is to
explore these as regularizing functionals in seeking solutions to
multi-variable and multi-dimensional moment problems. It will be shown that
extrema can be effectively constructed via a suitable homotopy. The homotopy
approach leads naturally to a further generalization and a description of all
the solutions to such moment problems. This is accomplished by a
renormalization of a Riemannian metric induced by entropy functionals. As an
application we discuss the inverse problem of describing power spectra which
are consistent with second-order statistics, which has been the main motivation
behind the present work.Comment: 24 pages, 3 figure
From Stochastic Mixability to Fast Rates
Empirical risk minimization (ERM) is a fundamental learning rule for
statistical learning problems where the data is generated according to some
unknown distribution and returns a hypothesis chosen from a
fixed class with small loss . In the parametric setting,
depending upon ERM can have slow
or fast rates of convergence of the excess risk as a
function of the sample size . There exist several results that give
sufficient conditions for fast rates in terms of joint properties of ,
, and , such as the margin condition and the Bernstein
condition. In the non-statistical prediction with expert advice setting, there
is an analogous slow and fast rate phenomenon, and it is entirely characterized
in terms of the mixability of the loss (there being no role there for
or ). The notion of stochastic mixability builds a
bridge between these two models of learning, reducing to classical mixability
in a special case. The present paper presents a direct proof of fast rates for
ERM in terms of stochastic mixability of , and
in so doing provides new insight into the fast-rates phenomenon. The proof
exploits an old result of Kemperman on the solution to the general moment
problem. We also show a partial converse that suggests a characterization of
fast rates for ERM in terms of stochastic mixability is possible.Comment: 21 pages, accepted to NIPS 201
An informational approach to the global optimization of expensive-to-evaluate functions
In many global optimization problems motivated by engineering applications,
the number of function evaluations is severely limited by time or cost. To
ensure that each evaluation contributes to the localization of good candidates
for the role of global minimizer, a sequential choice of evaluation points is
usually carried out. In particular, when Kriging is used to interpolate past
evaluations, the uncertainty associated with the lack of information on the
function can be expressed and used to compute a number of criteria accounting
for the interest of an additional evaluation at any given point. This paper
introduces minimizer entropy as a new Kriging-based criterion for the
sequential choice of points at which the function should be evaluated. Based on
\emph{stepwise uncertainty reduction}, it accounts for the informational gain
on the minimizer expected from a new evaluation. The criterion is approximated
using conditional simulations of the Gaussian process model behind Kriging, and
then inserted into an algorithm similar in spirit to the \emph{Efficient Global
Optimization} (EGO) algorithm. An empirical comparison is carried out between
our criterion and \emph{expected improvement}, one of the reference criteria in
the literature. Experimental results indicate major evaluation savings over
EGO. Finally, the method, which we call IAGO (for Informational Approach to
Global Optimization) is extended to robust optimization problems, where both
the factors to be tuned and the function evaluations are corrupted by noise.Comment: Accepted for publication in the Journal of Global Optimization (This
is the revised version, with additional details on computational problems,
and some grammatical changes
Lossy compression of discrete sources via Viterbi algorithm
We present a new lossy compressor for discrete-valued sources. For coding a
sequence , the encoder starts by assigning a certain cost to each possible
reconstruction sequence. It then finds the one that minimizes this cost and
describes it losslessly to the decoder via a universal lossless compressor. The
cost of each sequence is a linear combination of its distance from the sequence
and a linear function of its order empirical distribution.
The structure of the cost function allows the encoder to employ the Viterbi
algorithm to recover the minimizer of the cost. We identify a choice of the
coefficients comprising the linear function of the empirical distribution used
in the cost function which ensures that the algorithm universally achieves the
optimum rate-distortion performance of any stationary ergodic source in the
limit of large , provided that diverges as . Iterative
techniques for approximating the coefficients, which alleviate the
computational burden of finding the optimal coefficients, are proposed and
studied.Comment: 26 pages, 6 figures, Submitted to IEEE Transactions on Information
Theor
- âŠ