Search CORE

9,574 research outputs found

Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians

Author: Garrido M. C.
Lopez-de-Teruel P. E.
Ruiz A.
Publication venue: 'AI Access Foundation'
Publication date: 18/05/2011
Field of study

This paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the likelihood function of the (objective or subjective) observation are approximated by a special mixture model, in such a way that any desired conditional distribution can be directly obtained without numerical integration. We have developed an extended version of the expectation maximization (EM) algorithm to estimate the parameters of mixture models from uncertain training examples (indirect observations). As a consequence, any piece of exact or uncertain information about both input and output values is consistently handled in the inference and learning stages. This ability, extremely useful in certain situations, is not found in most alternative methods. The proposed framework is formally justified from standard probabilistic principles and illustrative examples are provided in the fields of nonparametric pattern classification, nonlinear regression and pattern completion. Finally, experiments on a real application and comparative results over standard databases provide empirical evidence of the utility of the method in a wide range of applications

arXiv.org e-Print Archive

Crossref

Brain covariance selection: better individual functional connectivity models using population prior

Author: Gramfort Alexandre
Poline Jean Baptiste
Thirion Bertrand
Varoquaux Gaël
Publication venue
Publication date: 01/01/2010
Field of study

Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.Comment: in Advances in Neural Information Processing Systems, Vancouver : Canada (2010

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL-CEA

Optimal Bandwidth Selection for Conditional Efficiency Measures: a Data-driven Approach

Author: Cinzia Daraio
Luiza Badin
Léopold Simar
Publication venue
Publication date
Field of study

In productivity analysis an important issue is to detect how external (environmental) factors, exogenous to the production process and not under the control of the producer, might influence the production process and the resulting efficiency of the firms. Most of the traditional approaches proposed in the literature have serious drawbacks. An alternative approach is to describe the production process as being conditioned by a given value of the environmental variables (Cazals, Florens and Simar, 2002, Daraio and Simar, 2005). This defines conditional efficiency measures where the production set in the input × output space may depend on the value of the external variables. The statistical properties of nonparametric estimators of these conditional measures are now established (Jeong, Park and Simar, 2008). These involve the estimation of a nonstandard conditional distribution function which requires the specification of a smoothing parameter (a bandwidth). So far, only the asymptotic optimal order of this bandwidth has been established. This is of little interest for the practitioner. In this paper we fill this gap and we propose a data-driven technique for selecting this parameter in practice. The approach, based on a Least Squares Cross Validation procedure (LSCV), provides an optimal bandwidth that minimizes an appropriate integrated Mean Squared Error (MSE). The method is carefully described and exemplified with some simulated data with univariate and multivariate environmental factors. An application on real data (performances of Mutual Funds) illustrates how this new optimal method of bandwidth selection outperforms former methods.Nonparametric efficiency estimation, conditional efficiency measures, environmental factors, conditional distribution function, bandwidth.

Research Papers in Economics

Metamodel-based importance sampling for structural reliability analysis

Author: Deheeger F.
Dubourg V.
Sudret B.
Publication venue
Publication date: 03/05/2011
Field of study

Structural reliability methods aim at computing the probability of failure of systems with respect to some prescribed performance functions. In modern engineering such functions usually resort to running an expensive-to-evaluate computational model (e.g. a finite element model). In this respect simulation methods, which may require

10^{3-6}

runs cannot be used directly. Surrogate models such as quadratic response surfaces, polynomial chaos expansions or kriging (which are built from a limited number of runs of the original model) are then introduced as a substitute of the original model to cope with the computational cost. In practice it is almost impossible to quantify the error made by this substitution though. In this paper we propose to use a kriging surrogate of the performance function as a means to build a quasi-optimal importance sampling density. The probability of failure is eventually obtained as the product of an augmented probability computed by substituting the meta-model for the original performance function and a correction term which ensures that there is no bias in the estimation even if the meta-model is not fully accurate. The approach is applied to analytical and finite element reliability problems and proves efficient up to 100 random variables.Comment: 20 pages, 7 figures, 2 tables. Preprint submitted to Probabilistic Engineering Mechanic

arXiv.org e-Print Archive

HAL Clermont Université

Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Author: Beck Daniel
Cohn Trevor
Specia Lucia
Publication venue
Publication date: 01/01/2016
Field of study

Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments. Such scenarios can be improved if quality predictions are accompanied by a measure of uncertainty. However, models in this task are traditionally evaluated only in terms of point estimate metrics, which do not take prediction uncertainty into account. We investigate probabilistic methods for Quality Estimation that can provide well-calibrated uncertainty estimates and evaluate them in terms of their full posterior predictive distributions. We also show how this posterior information can be useful in an asymmetric risk scenario, which aims to capture typical situations in translation workflows.Comment: Proceedings of CoNLL 201

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University