9,574 research outputs found
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
This paper presents a general and efficient framework for probabilistic
inference and learning from arbitrary uncertain information. It exploits the
calculation properties of finite mixture models, conjugate families and
factorization. Both the joint probability density of the variables and the
likelihood function of the (objective or subjective) observation are
approximated by a special mixture model, in such a way that any desired
conditional distribution can be directly obtained without numerical
integration. We have developed an extended version of the expectation
maximization (EM) algorithm to estimate the parameters of mixture models from
uncertain training examples (indirect observations). As a consequence, any
piece of exact or uncertain information about both input and output values is
consistently handled in the inference and learning stages. This ability,
extremely useful in certain situations, is not found in most alternative
methods. The proposed framework is formally justified from standard
probabilistic principles and illustrative examples are provided in the fields
of nonparametric pattern classification, nonlinear regression and pattern
completion. Finally, experiments on a real application and comparative results
over standard databases provide empirical evidence of the utility of the method
in a wide range of applications
Brain covariance selection: better individual functional connectivity models using population prior
Spontaneous brain activity, as observed in functional neuroimaging, has been
shown to display reproducible structure that expresses brain architecture and
carries markers of brain pathologies. An important view of modern neuroscience
is that such large-scale structure of coherent activity reflects modularity
properties of brain connectivity graphs. However, to date, there has been no
demonstration that the limited and noisy data available in spontaneous activity
observations could be used to learn full-brain probabilistic models that
generalize to new data. Learning such models entails two main challenges: i)
modeling full brain connectivity is a difficult estimation problem that faces
the curse of dimensionality and ii) variability between subjects, coupled with
the variability of functional signals between experimental runs, makes the use
of multiple datasets challenging. We describe subject-level brain functional
connectivity structure as a multivariate Gaussian process and introduce a new
strategy to estimate it from group data, by imposing a common structure on the
graphical model in the population. We show that individual models learned from
functional Magnetic Resonance Imaging (fMRI) data using this population prior
generalize better to unseen data than models based on alternative
regularization schemes. To our knowledge, this is the first report of a
cross-validated model of spontaneous brain activity. Finally, we use the
estimated graphical model to explore the large-scale characteristics of
functional architecture and show for the first time that known cognitive
networks appear as the integrated communities of functional connectivity graph.Comment: in Advances in Neural Information Processing Systems, Vancouver :
Canada (2010
Optimal Bandwidth Selection for Conditional Efficiency Measures: a Data-driven Approach
In productivity analysis an important issue is to detect how external (environmental) factors, exogenous to the production process and not under the control of the producer, might influence the production process and the resulting efficiency of the firms. Most of the traditional approaches proposed in the literature have serious drawbacks. An alternative approach is to describe the production process as being conditioned by a given value of the environmental variables (Cazals, Florens and Simar, 2002, Daraio and Simar, 2005). This defines conditional efficiency measures where the production set in the input × output space may depend on the value of the external variables. The statistical properties of nonparametric estimators of these conditional measures are now established (Jeong, Park and Simar, 2008). These involve the estimation of a nonstandard conditional distribution function which requires the specification of a smoothing parameter (a bandwidth). So far, only the asymptotic optimal order of this bandwidth has been established. This is of little interest for the practitioner. In this paper we fill this gap and we propose a data-driven technique for selecting this parameter in practice. The approach, based on a Least Squares Cross Validation procedure (LSCV), provides an optimal bandwidth that minimizes an appropriate integrated Mean Squared Error (MSE). The method is carefully described and exemplified with some simulated data with univariate and multivariate environmental factors. An application on real data (performances of Mutual Funds) illustrates how this new optimal method of bandwidth selection outperforms former methods.Nonparametric efficiency estimation, conditional efficiency measures, environmental factors, conditional distribution function, bandwidth.
Metamodel-based importance sampling for structural reliability analysis
Structural reliability methods aim at computing the probability of failure of
systems with respect to some prescribed performance functions. In modern
engineering such functions usually resort to running an expensive-to-evaluate
computational model (e.g. a finite element model). In this respect simulation
methods, which may require runs cannot be used directly. Surrogate
models such as quadratic response surfaces, polynomial chaos expansions or
kriging (which are built from a limited number of runs of the original model)
are then introduced as a substitute of the original model to cope with the
computational cost. In practice it is almost impossible to quantify the error
made by this substitution though. In this paper we propose to use a kriging
surrogate of the performance function as a means to build a quasi-optimal
importance sampling density. The probability of failure is eventually obtained
as the product of an augmented probability computed by substituting the
meta-model for the original performance function and a correction term which
ensures that there is no bias in the estimation even if the meta-model is not
fully accurate. The approach is applied to analytical and finite element
reliability problems and proves efficient up to 100 random variables.Comment: 20 pages, 7 figures, 2 tables. Preprint submitted to Probabilistic
Engineering Mechanic
Exploring Prediction Uncertainty in Machine Translation Quality Estimation
Machine Translation Quality Estimation is a notoriously difficult task, which
lessens its usefulness in real-world translation environments. Such scenarios
can be improved if quality predictions are accompanied by a measure of
uncertainty. However, models in this task are traditionally evaluated only in
terms of point estimate metrics, which do not take prediction uncertainty into
account. We investigate probabilistic methods for Quality Estimation that can
provide well-calibrated uncertainty estimates and evaluate them in terms of
their full posterior predictive distributions. We also show how this posterior
information can be useful in an asymmetric risk scenario, which aims to capture
typical situations in translation workflows.Comment: Proceedings of CoNLL 201
- …