Search CORE

5 research outputs found

Block-regularized 5 $\times$ 2 Cross-validated McNemar's Test for Comparing Two Classification Algorithms

Author: Li Jihong
Wang Ruibo
Publication venue
Publication date: 08/04/2023
Field of study

In the task of comparing two classification algorithms, the widely-used McNemar's test aims to infer the presence of a significant difference between the error rates of the two classification algorithms. However, the power of the conventional McNemar's test is usually unpromising because the hold-out (HO) method in the test merely uses a single train-validation split that usually produces a highly varied estimation of the error rates. In contrast, a cross-validation (CV) method repeats the HO method in multiple times and produces a stable estimation. Therefore, a CV method has a great advantage to improve the power of McNemar's test. Among all types of CV methods, a block-regularized 5

\times

2 CV (BCV) has been shown in many previous studies to be superior to the other CV methods in the comparison task of algorithms because the 5

\times

2 BCV can produce a high-quality estimator of the error rate by regularizing the numbers of overlapping records between all training sets. In this study, we compress the 10 correlated contingency tables in the 5

\times

2 BCV to form an effective contingency table. Then, we define a 5

\times

2 BCV McNemar's test on the basis of the effective contingency table. We demonstrate the reasonable type I error and the promising power of the proposed 5

\times

2 BCV McNemar's test on multiple simulated and real-world data sets.Comment: 12 pages, 6 figures, and 5 table

arXiv.org e-Print Archive

Estimating the Confidence Interval of Expected Performance Curve in Biometric Authentication Using Joint Bootstrap

Author: Bengio Samy
Poh Norman
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

Evaluating biometric authentication performance is a complex task because the performance depends on the user set size, composition and the choice of samples. We propose to reduce the performance dependency of these three factors by deriving appropriate confidence intervals. In this study, we focus on deriving a confidence region based on the recently proposed Expected Performance Curve (EPC). An EPC is different from the conventional DET or ROC curve because an EPC assumes that the test class-conditional (client and impostor) score distributions are unknown and this includes the choice of the decision threshold for various operating points. Instead, an EPC selects thresholds based on the training set and applies them on the test set. The proposed technique is useful, for example, to quote realistic upper and lower bounds of the decision cost function used in the NIST annual speaker evaluation. Our findings, based on the 24 systems submitted to the NIST2005 evaluation, show that the confidence region obtained from our proposed algorithm can correctly predict the performance of an unseen database with two times more users with an average coverage of 95\% (over all the 24 systems). A coverage is the proportion of the unseen EPC covered by the derived confidence interval

Infoscience - École polytechnique fédérale de Lausanne

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Author: Athanassios Katsamanis
George Papandreou
Petros Maragos
Vassilis Pitsikalis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A Multitask Learning Approach to Document Representation using Unlabeled Data

Author: Bengio Samy
Keller Mikaela
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

Text categorization is intrinsically a supervised learning task, which aims at relating a given text document to one or more predefined categories. Unfortunately, labeling such databases of documents is a painful task. We present in this paper a method that takes advantage of huge amounts of unlabeled text documents available in digital format, to counter balance the relatively smaller available amount of labeled text documents. A Siamese MLP is trained in a multi-task framework in order to solve two concurrent tasks: using the unlabeled data, we search for a mapping from the documents' bag-of-word representation to a new feature space emphasizing similarities and dissimilarities among documents; simultaneously, this mapping is constrained to also give good text categorization performance over the labeled dataset. Experimental results on Reuters RCV1 suggest that, as expected, performance over the labeled task increases as the amount of unlabeled data increases

Infoscience - École polytechnique fédérale de Lausanne

Benchmarking non-parametric statistical tests

Author: Mikaela Keller
Samy Bengio
Siew Yeung Wong
Publication venue
Publication date: 01/01/2005
Field of study

Although non-parametric tests have already been proposed for that purpose, statistical significance tests for non-standard measures (different from the classification error) are less often used in the literature. This paper is an attempt at empirically verifying how these tests compare with more classical tests, on various conditions. More precisely, using a very large dataset to estimate the whole “population”, we analyzed the behavior of several statistical test, varying the class unbalance, the compared models, the performance measure, and the sample size. The main result is that providing big enough evaluation sets non-parametric tests are relatively reliable in all conditions.

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX