1,755 research outputs found
Optimal Kullback-Leibler Aggregation via Information Bottleneck
In this paper, we present a method for reducing a regular, discrete-time
Markov chain (DTMC) to another DTMC with a given, typically much smaller number
of states. The cost of reduction is defined as the Kullback-Leibler divergence
rate between a projection of the original process through a partition function
and a DTMC on the correspondingly partitioned state space. Finding the reduced
model with minimal cost is computationally expensive, as it requires an
exhaustive search among all state space partitions, and an exact evaluation of
the reduction cost for each candidate partition. Our approach deals with the
latter problem by minimizing an upper bound on the reduction cost instead of
minimizing the exact cost; The proposed upper bound is easy to compute and it
is tight if the original chain is lumpable with respect to the partition. Then,
we express the problem in the form of information bottleneck optimization, and
propose using the agglomerative information bottleneck algorithm for searching
a sub-optimal partition greedily, rather than exhaustively. The theory is
illustrated with examples and one application scenario in the context of
modeling bio-molecular interactions.Comment: 13 pages, 4 figure
Kullback-Leibler aggregation and misspecified generalized linear models
In a regression setup with deterministic design, we study the pure
aggregation problem and introduce a natural extension from the Gaussian
distribution to distributions in the exponential family. While this extension
bears strong connections with generalized linear models, it does not require
identifiability of the parameter or even that the model on the systematic
component is true. It is shown that this problem can be solved by constrained
and/or penalized likelihood maximization and we derive sharp oracle
inequalities that hold both in expectation and with high probability. Finally
all the bounds are proved to be optimal in a minimax sense.Comment: Published in at http://dx.doi.org/10.1214/11-AOS961 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian Nonparametric Calibration and Combination of Predictive Distributions
We introduce a Bayesian approach to predictive density calibration and
combination that accounts for parameter uncertainty and model set
incompleteness through the use of random calibration functionals and random
combination weights. Building on the work of Ranjan, R. and Gneiting, T. (2010)
and Gneiting, T. and Ranjan, R. (2013), we use infinite beta mixtures for the
calibration. The proposed Bayesian nonparametric approach takes advantage of
the flexibility of Dirichlet process mixtures to achieve any continuous
deformation of linearly combined predictive distributions. The inference
procedure is based on Gibbs sampling and allows accounting for uncertainty in
the number of mixture components, mixture weights, and calibration parameters.
The weak posterior consistency of the Bayesian nonparametric calibration is
provided under suitable conditions for unknown true density. We study the
methodology in simulation examples with fat tails and multimodal densities and
apply it to density forecasts of daily S&P returns and daily maximum wind speed
at the Frankfurt airport.Comment: arXiv admin note: text overlap with arXiv:1305.2026 by other author
Multiscale likelihood analysis and complexity penalized estimation
We describe here a framework for a certain class of multiscale likelihood
factorizations wherein, in analogy to a wavelet decomposition of an L^2
function, a given likelihood function has an alternative representation as a
product of conditional densities reflecting information in both the data and
the parameter vector localized in position and scale. The framework is
developed as a set of sufficient conditions for the existence of such
factorizations, formulated in analogy to those underlying a standard
multiresolution analysis for wavelets, and hence can be viewed as a
multiresolution analysis for likelihoods. We then consider the use of these
factorizations in the task of nonparametric, complexity penalized likelihood
estimation. We study the risk properties of certain thresholding and
partitioning estimators, and demonstrate their adaptivity and near-optimality,
in a minimax sense over a broad range of function spaces, based on squared
Hellinger distance as a loss function. In particular, our results provide an
illustration of how properties of classical wavelet-based estimators can be
obtained in a single, unified framework that includes models for continuous,
count and categorical data types
- …