2 research outputs found
Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Power Analysis and Sample Size Estimation
This study's first purpose is to provide quantitative evidence that would
incentivize researchers to instead use the more robust method of nested
cross-validation. The second purpose is to present methods and MATLAB codes for
doing power analysis for ML-based analysis during the design of a study. Monte
Carlo simulations were used to quantify the interactions between the employed
cross-validation method, the discriminative power of features, the
dimensionality of the feature space, and the dimensionality of the model. Four
different cross-validations (single holdout, 10-fold, train-validation-test,
and nested 10-fold) were compared based on the statistical power and
statistical confidence of the ML models. Distributions of the null and
alternative hypotheses were used to determine the minimum required sample size
for obtaining a statistically significant outcome ({\alpha}=0.05,
1-\b{eta}=0.8). Statistical confidence of the model was defined as the
probability of correct features being selected and hence being included in the
final model. Our analysis showed that the model generated based on the single
holdout method had very low statistical power and statistical confidence and
that it significantly overestimated the accuracy. Conversely, the nested
10-fold cross-validation resulted in the highest statistical confidence and the
highest statistical power, while providing an unbiased estimate of the
accuracy. The required sample size with a single holdout could be 50% higher
than what would be needed if nested cross-validation were used. Confidence in
the model based on nested cross-validation was as much as four times higher
than the confidence in the single holdout-based model. A computational model,
MATLAB codes, and lookup tables are provided to assist researchers with
estimating the sample size during the design of their future studies.Comment: Under review at JSLH