Search CORE

9,719 research outputs found

A universal approximate cross-validation criterion and its asymptotic distribution

Author: AkaikeH
AkaikeH
Benoit Liquet
Braun
Braun
Brier
Brier
Burnham
Burnham
Commenges
Commenges
Commenges
Commenges
Commenges
Commenges
Cover
Cover
Cécile Proust-Lima
Cécilia Samieri
Daniel Commenges
Folstein
Folstein
Gneiting
Gneiting
Golub
Golub
Greven
Greven
Gu
Gu
Hall
Hall
Konishi
Konishi
Konishi
Konishi
Letenneur
Letenneur
Liquet
Liquet
Murata
Murata
O’Sullivan
O’Sullivan
Proust
Proust
Proust
Proust
Proust-Lima
Proust-Lima
Stone
Stone
Takeuchi
Takeuchi
Vaida
Vaida
Van Der Laan
Van Der Laan
Van der Vaart
Van der Vaart
Vuong
Vuong
Wahba
Wahba
Watanabe
Watanabe
Xiang
Xiang
Xu
Xu
Publication venue
Publication date: 08/06/2012
Field of study

A general framework is that the estimators of a distribution are obtained by minimizing a function (the estimating function) and they are assessed through another function (the assessment function). The estimating and assessment functions generally estimate risks. A classical case is that both functions estimate an information risk (specifically cross entropy); in that case Akaike information criterion (AIC) is relevant. In more general cases, the assessment risk can be estimated by leave-one-out crossvalidation. Since leave-one-out crossvalidation is computationally very demanding, an approximation formula can be very useful. A universal approximate crossvalidation criterion (UACV) for the leave-one-out crossvalidation is given. This criterion can be adapted to different types of estimators, including penalized likelihood and maximum a posteriori estimators, and of assessment risk functions, including information risk functions and continuous rank probability score (CRPS). This formula reduces to Takeuchi information criterion (TIC) when cross entropy is the risk for both estimation and assessment. The asymptotic distribution of UACV and of a difference of UACV is given. UACV can be used for comparing estimators of the distributions of ordered categorical data derived from threshold models and models based on continuous approximations. A simulation study and an analysis of real psychometric data are presented.Comment: 23 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Bias Reduction via End-to-End Shift Learning: Application to Citizen Science

Author: Chen Di
Gomes Carla P.
Publication venue
Publication date: 14/11/2018
Field of study

Citizen science projects are successful at gathering rich datasets for various applications. However, the data collected by citizen scientists are often biased --- in particular, aligned more with the citizens' preferences than with scientific objectives. We propose the Shift Compensation Network (SCN), an end-to-end learning scheme which learns the shift from the scientific objectives to the biased data while compensating for the shift by re-weighting the training data. Applied to bird observational data from the citizen science project eBird, we demonstrate how SCN quantifies the data distribution shift and outperforms supervised learning models that do not address the data bias. Compared with competing models in the context of covariate shift, we further demonstrate the advantage of SCN in both its effectiveness and its capability of handling massive high-dimensional data

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Kernel-Based Calculation of Information on a Metric Space

Author: Houghton Conor J.
Tobin R. Joshua
Publication venue: 'MDPI AG'
Publication date: 01/01/2013
Field of study

Kernel density estimation is a technique for approximating probability distributions. Here, it is applied to the calculation of mutual information on a metric space. This is motivated by the problem in neuroscience of calculating the mutual information between stimuli and spiking responses; the space of these responses is a metric space. It is shown that kernel density estimation on a metric space resembles the k-nearest-neighbor approach. This approach is applied to a toy dataset designed to mimic electrophysiological data

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Explore Bristol Research