2 research outputs found
Extrapolating Expected Accuracies for Large Multi-Class Problems
The difficulty of multi-class classification generally increases with the
number of classes. Using data from a subset of the classes, can we predict how
well a classifier will scale with an increased number of classes? Under the
assumptions that the classes are sampled identically and independently from a
population, and that the classifier is based on independently learned scoring
functions, we show that the expected accuracy when the classifier is trained on
k classes is the (k-1)st moment of a certain distribution that can be estimated
from data. We present an unbiased estimation method based on the theory, and
demonstrate its application on a facial recognition example.Comment: Submitted to JML
Statistical Analysis of Data Repeatability Measures
The advent of modern data collection and processing techniques has seen the
size, scale, and complexity of data grow exponentially. A seminal step in
leveraging these rich datasets for downstream inference is understanding the
characteristics of the data which are repeatable -- the aspects of the data
that are able to be identified under a duplicated analysis. Conflictingly, the
utility of traditional repeatability measures, such as the intraclass
correlation coefficient, under these settings is limited. In recent work, novel
data repeatability measures have been introduced in the context where a set of
subjects are measured twice or more, including: fingerprinting, rank sums, and
generalizations of the intraclass correlation coefficient. However, the
relationships between, and the best practices among these measures remains
largely unknown. In this manuscript, we formalize a novel repeatability
measure, discriminability. We show that it is deterministically linked with the
correlation coefficient under univariate random effect models, and has desired
property of optimal accuracy for inferential tasks using multivariate
measurements. Additionally, we overview and systematically compare
repeatability statistics using both theoretical results and simulations. We
show that the rank sum statistic is deterministically linked to a consistent
estimator of discriminability. The power of permutation tests derived from
these measures are compared numerically under Gaussian and non-Gaussian
settings, with and without simulated batch effects. Motivated by both
theoretical and empirical results, we provide methodological recommendations
for each benchmark setting to serve as a resource for future analyses. We
believe these recommendations will play an important role towards improving
repeatability in fields such as functional magnetic resonance imaging,
genomics, pharmacology, and more