In the task of comparing two classification algorithms, the widely-used
McNemar's test aims to infer the presence of a significant difference between
the error rates of the two classification algorithms. However, the power of the
conventional McNemar's test is usually unpromising because the hold-out (HO)
method in the test merely uses a single train-validation split that usually
produces a highly varied estimation of the error rates. In contrast, a
cross-validation (CV) method repeats the HO method in multiple times and
produces a stable estimation. Therefore, a CV method has a great advantage to
improve the power of McNemar's test. Among all types of CV methods, a
block-regularized 5×2 CV (BCV) has been shown in many previous studies
to be superior to the other CV methods in the comparison task of algorithms
because the 5×2 BCV can produce a high-quality estimator of the error
rate by regularizing the numbers of overlapping records between all training
sets. In this study, we compress the 10 correlated contingency tables in the
5×2 BCV to form an effective contingency table. Then, we define a
5×2 BCV McNemar's test on the basis of the effective contingency table.
We demonstrate the reasonable type I error and the promising power of the
proposed 5×2 BCV McNemar's test on multiple simulated and real-world
data sets.Comment: 12 pages, 6 figures, and 5 table