9,719 research outputs found
A universal approximate cross-validation criterion and its asymptotic distribution
A general framework is that the estimators of a distribution are obtained by
minimizing a function (the estimating function) and they are assessed through
another function (the assessment function). The estimating and assessment
functions generally estimate risks. A classical case is that both functions
estimate an information risk (specifically cross entropy); in that case Akaike
information criterion (AIC) is relevant. In more general cases, the assessment
risk can be estimated by leave-one-out crossvalidation. Since leave-one-out
crossvalidation is computationally very demanding, an approximation formula can
be very useful. A universal approximate crossvalidation criterion (UACV) for
the leave-one-out crossvalidation is given. This criterion can be adapted to
different types of estimators, including penalized likelihood and maximum a
posteriori estimators, and of assessment risk functions, including information
risk functions and continuous rank probability score (CRPS). This formula
reduces to Takeuchi information criterion (TIC) when cross entropy is the risk
for both estimation and assessment. The asymptotic distribution of UACV and of
a difference of UACV is given. UACV can be used for comparing estimators of the
distributions of ordered categorical data derived from threshold models and
models based on continuous approximations. A simulation study and an analysis
of real psychometric data are presented.Comment: 23 pages, 2 figure
Bias Reduction via End-to-End Shift Learning: Application to Citizen Science
Citizen science projects are successful at gathering rich datasets for
various applications. However, the data collected by citizen scientists are
often biased --- in particular, aligned more with the citizens' preferences
than with scientific objectives. We propose the Shift Compensation Network
(SCN), an end-to-end learning scheme which learns the shift from the scientific
objectives to the biased data while compensating for the shift by re-weighting
the training data. Applied to bird observational data from the citizen science
project eBird, we demonstrate how SCN quantifies the data distribution shift
and outperforms supervised learning models that do not address the data bias.
Compared with competing models in the context of covariate shift, we further
demonstrate the advantage of SCN in both its effectiveness and its capability
of handling massive high-dimensional data
A Kernel-Based Calculation of Information on a Metric Space
Kernel density estimation is a technique for approximating probability
distributions. Here, it is applied to the calculation of mutual information on
a metric space. This is motivated by the problem in neuroscience of calculating
the mutual information between stimuli and spiking responses; the space of
these responses is a metric space. It is shown that kernel density estimation
on a metric space resembles the k-nearest-neighbor approach. This approach is
applied to a toy dataset designed to mimic electrophysiological data
- …