328 research outputs found
A multi-resolution, non-parametric, Bayesian framework for identification of spatially-varying model parameters
This paper proposes a hierarchical, multi-resolution framework for the
identification of model parameters and their spatially variability from noisy
measurements of the response or output. Such parameters are frequently
encountered in PDE-based models and correspond to quantities such as density or
pressure fields, elasto-plastic moduli and internal variables in solid
mechanics, conductivity fields in heat diffusion problems, permeability fields
in fluid flow through porous media etc. The proposed model has all the
advantages of traditional Bayesian formulations such as the ability to produce
measures of confidence for the inferences made and providing not only
predictive estimates but also quantitative measures of the predictive
uncertainty. In contrast to existing approaches it utilizes a parsimonious,
non-parametric formulation that favors sparse representations and whose
complexity can be determined from the data. The proposed framework in
non-intrusive and makes use of a sequence of forward solvers operating at
various resolutions. As a result, inexpensive, coarse solvers are used to
identify the most salient features of the unknown field(s) which are
subsequently enriched by invoking solvers operating at finer resolutions. This
leads to significant computational savings particularly in problems involving
computationally demanding forward models but also improvements in accuracy. It
is based on a novel, adaptive scheme based on Sequential Monte Carlo sampling
which is embarrassingly parallelizable and circumvents issues with slow mixing
encountered in Markov Chain Monte Carlo schemes
Stochastic expansions using continuous dictionaries: L\'{e}vy adaptive regression kernels
This article describes a new class of prior distributions for nonparametric
function estimation. The unknown function is modeled as a limit of weighted
sums of kernels or generator functions indexed by continuous parameters that
control local and global features such as their translation, dilation,
modulation and shape. L\'{e}vy random fields and their stochastic integrals are
employed to induce prior distributions for the unknown functions or,
equivalently, for the number of kernels and for the parameters governing their
features. Scaling, shape, and other features of the generating functions are
location-specific to allow quite different function properties in different
parts of the space, as with wavelet bases and other methods employing
overcomplete dictionaries. We provide conditions under which the stochastic
expansions converge in specified Besov or Sobolev norms. Under a Gaussian error
model, this may be viewed as a sparse regression problem, with regularization
induced via the L\'{e}vy random field prior distribution. Posterior inference
for the unknown functions is based on a reversible jump Markov chain Monte
Carlo algorithm. We compare the L\'{e}vy Adaptive Regression Kernel (LARK)
method to wavelet-based methods using some of the standard test functions, and
illustrate its flexibility and adaptability in nonstationary applications.Comment: Published in at http://dx.doi.org/10.1214/11-AOS889 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
We demonstrate the effectiveness of multilingual learning for unsupervised
part-of-speech tagging. The central assumption of our work is that by combining
cues from multiple languages, the structure of each becomes more apparent. We
consider two ways of applying this intuition to the problem of unsupervised
part-of-speech tagging: a model that directly merges tag structures for a pair
of languages into a single sequence and a second model which instead
incorporates multilingual context using latent variables. Both approaches are
formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo
sampling techniques for inference. Our results demonstrate that by
incorporating multilingual evidence we can achieve impressive performance gains
across a range of scenarios. We also found that performance improves steadily
as the number of available languages increases
Contributions to probabilistic non-negative matrix factorization - Maximum marginal likelihood estimation and Markovian temporal models
Non-negative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a non-negative data matrix (i.e., with non-negative entries) as the product of twonon-negative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This low-rank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of non-negative factors. This general framework, coined probabilistic NMF, encompasses many well-known latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemi-Bayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients sample-wise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data
- …