1,020 research outputs found
Epistemic uncertainty quantification in deep learning classification by the Delta method
The Delta method is a classical procedure for quantifying epistemic uncertainty in statistical models, but its direct application to deep neural networks is prevented by the large number of parameters . We propose a low cost approximation of the Delta method applicable to -regularized deep neural networks based on the top eigenpairs of the Fisher information matrix. We address efficient computation of full-rank approximate eigendecompositions in terms of the exact inverse Hessian, the inverse outer-products of gradients approximation and the so-called Sandwich estimator. Moreover, we provide bounds on the approximation error for the uncertainty of the predictive class probabilities. We show that when the smallest computed eigenvalue of the Fisher information matrix is near the -regularization rate, the approximation error will be close to zero even when . A demonstration of the methodology is presented using a TensorFlow implementation, and we show that meaningful rankings of images based on predictive uncertainty can be obtained for two LeNet and ResNet-based neural networks using the MNIST and CIFAR-10 datasets. Further, we observe that false positives have on average a higher predictive epistemic uncertainty than true positives. This suggests that there is supplementing information in the uncertainty measure not captured by the classification alone.publishedVersio
Epistemic Uncertainty Quantification in Deep Learning by the Delta Method
This thesis explores the Delta method and its application to deep learning image classification. The Delta method is a classical procedure for quantifying uncertainty in statistical models, but its direct application to deep neural networks is prevented by the large number of parameters P. We recognize the Delta method as a measure of epistemic as opposed to aleatoric uncertainty and break it into two components: the eigenvalue spectrum of the inverse Fisher information (i.e. inverse Hessian) of the cost function and the per-example sensitivities (i.e. gradients) of the model function. We mainly focus on the computational aspects, and show how to efficiently compute low and full-rank approximations of the inverse Fisher information matrix, which in turn reduces the computational complexity of the naïve Delta method from O(P²) space and O(P³) time, to O(P) space and time. We provide bounds for the approximation error by a novel error propagating technique, and validate the developed methodology with a released TensorFlow implementation. By a comparison with the classical Bootstrap, we show that there is a strong linear relationship between the quantified predictive epistemic uncertainty levels obtained from the two methods when applied on a few well known architectures using the MNIST and CIFAR-10 datasets.Doktorgradsavhandlin
A universal approximate cross-validation criterion and its asymptotic distribution
A general framework is that the estimators of a distribution are obtained by
minimizing a function (the estimating function) and they are assessed through
another function (the assessment function). The estimating and assessment
functions generally estimate risks. A classical case is that both functions
estimate an information risk (specifically cross entropy); in that case Akaike
information criterion (AIC) is relevant. In more general cases, the assessment
risk can be estimated by leave-one-out crossvalidation. Since leave-one-out
crossvalidation is computationally very demanding, an approximation formula can
be very useful. A universal approximate crossvalidation criterion (UACV) for
the leave-one-out crossvalidation is given. This criterion can be adapted to
different types of estimators, including penalized likelihood and maximum a
posteriori estimators, and of assessment risk functions, including information
risk functions and continuous rank probability score (CRPS). This formula
reduces to Takeuchi information criterion (TIC) when cross entropy is the risk
for both estimation and assessment. The asymptotic distribution of UACV and of
a difference of UACV is given. UACV can be used for comparing estimators of the
distributions of ordered categorical data derived from threshold models and
models based on continuous approximations. A simulation study and an analysis
of real psychometric data are presented.Comment: 23 pages, 2 figure
Iterative and Recursive Estimation in Structural Non-Adaptive Models
An inference method, called latent backfitting is proposed. It appears well suited for econometric models where the structural relationships of interest define the observed endogenous variables as a known function of unobserved state variables and unknown parameters. This nonlinear state space specification paves the way for iterative or recursive EM-like strategies. In the E-steps the state variables are forecasted given the observations and a value of the parameters. In the M-steps these forecasts are used to deduce estimators of the unknown parameters from the statistical model of latent variables. The proposed iterative/recursive estimation is particularly useful for latent regression models and for dynamic equilibrium models involving latent state variables. Practical implementation issues are discussed through the example of term structure models of interest rates. Nous proposons une méthode d'inférence appelée «latent backfitting». Cette méthode est spécialement conçue pour les modèles économétriques dans lesquels les relations structurelles d'intérêt définissent les variables endogènes observées comme une fonction connue des variables d'états non observées et des paramètres inconnus. Cette spécification espace-état non linéaire ouvre la voie à des stratégies itératives ou récursives de type EM. Dans l'étape E, les variables d'état sont prédites à partir des observations et des valeurs des paramètres. Dans l'étape M, ces prévisions sont utilisées pour déduire des estimateurs des paramètres inconnus à partir du modèle statistique des variables latentes. L'estimation itérative/récursive proposée est particulièrement utile pour les modèles avec équation de régression latente et les modèles dynamiques d'équilibre utilisant des variables d'état latentes. Les questions relatives à l'application de ces méthodes sont analysées à travers l'exemple des modèles de structure par termes des taux d'intérêt.Asset Pricing Models, Latent Variables, Estimation, Iterative or Recursive Algorithms, Modèles d'évaluation d'actifs financiers, variables latentes, estimation, algorithmes itératifs ou récursifs
- …