71 research outputs found
On The Smoothness of Cross-Validation-Based Estimators Of Classifier Performance
Many versions of cross-validation (CV) exist in the literature; and each
version though has different variants. All are used interchangeably by many
practitioners; yet, without explanation to the connection or difference among
them. This article has three contributions. First, it starts by mathematical
formalization of these different versions and variants that estimate the error
rate and the Area Under the ROC Curve (AUC) of a classification rule, to show
the connection and difference among them. Second, we prove some of their
properties and prove that many variants are either redundant or "not smooth".
Hence, we suggest to abandon all redundant versions and variants and only keep
the leave-one-out, the -fold, and the repeated -fold. We show that the
latter is the only among the three versions that is "smooth" and hence looks
mathematically like estimating the mean performance of the classification
rules. However, empirically, for the known phenomenon of "weak correlation",
which we explain mathematically and experimentally, it estimates both
conditional and mean performance almost with the same accuracy. Third, we
conclude the article with suggesting two research points that may answer the
remaining question of whether we can come up with a finalist among the three
estimators: (1) a comparative study, that is much more comprehensive than those
available in literature and conclude no overall winner, is needed to consider a
wide range of distributions, datasets, and classifiers including complex ones
obtained via the recent deep learning approach. (2) we sketch the path of
deriving a rigorous method for estimating the variance of the only "smooth"
version, repeated -fold CV, rather than those ad-hoc methods available in
the literature that ignore the covariance structure among the folds of CV.Comment: The paper is currently under review in Pattern Recognition Letters
(PRL
AUC: Nonparametric Estimators and Their Smoothness
Nonparametric estimation of a statistic, in general, and of the error rate of
a classification rule, in particular, from just one available dataset through
resampling is well mathematically founded in the literature using several
versions of bootstrap and influence function. This article first provides a
concise review of this literature to establish the theoretical framework that
we use to construct, in a single coherent framework, nonparametric estimators
of the AUC (a two-sample statistic) other than the error rate (a one-sample
statistic). In addition, the smoothness of some of these estimators is well
investigated and explained. Our experiments show that the behavior of the
designed AUC estimators confirms the findings of the literature for the
behavior of error rate estimators in many aspects including: the weak
correlation between the bootstrap-based estimators and the true conditional
AUC; and the comparable accuracy of the different versions of the bootstrap
estimators in terms of the RMS with little superiority of the .632+ bootstrap
estimator
A Review of Statistical Learning Machines from ATR to DNA Microarrays: design, assessment, and advice for practitioners
Statistical Learning is the process of estimating an unknown probabilistic
input-output relationship of a system using a limited number of observations;
and a statistical learning machine (SLM) is the machine that learned such a
process. While their roots grow deeply in Probability Theory, SLMs are
ubiquitous in the modern world. Automatic Target Recognition (ATR) in military
applications, Computer Aided Diagnosis (CAD) in medical imaging, DNA
microarrays in Genomics, Optical Character Recognition (OCR), Speech
Recognition (SR), spam email filtering, stock market prediction, etc., are few
examples and applications for SLM; diverse fields but one theory.
The field of Statistical Learning can be decomposed to two basic subfields,
Design and Assessment. Three main groups of specializations-namely
statisticians, engineers, and computer scientists (ordered ascendingly by
programming capabilities and descendingly by mathematical rigor)-exist on the
venue of this field and each takes its elephant bite. Exaggerated rigorous
analysis of statisticians sometimes deprives them from considering new ML
techniques and methods that, yet, have no "complete" mathematical theory. On
the other hand, immoderate add-hoc simulations of computer scientists sometimes
derive them towards unjustified and immature results. A prudent approach is
needed that has the enough flexibility to utilize simulations and trials and
errors without sacrificing any rigor. If this prudent attitude is necessary for
this field it is necessary, as well, in other fields of Engineering.Comment: This manuscript was composed in 2006 as part of a the author's Ph.D.
dissertatio
Effect of Sperm Separation Methods on Morphology and Functions of Frozen Buffalo Spermatozoa
This work was planned to compare three methods for selection of active buffalo spermatozoa, examine the effects of these separation methods on morphology, viability and functions of spermatozoa used for IVF purposes. Ten frozen straws per trial (10 times) were pooled and divided into 4 aliquots: A) First aliquot was considered as control without any separation method. B) Second aliquot was subjected to sperm selection by density gradient method (percoll:PureSperm) using 40-80% double density gradient. C) The third aliquot was subjected to swim-up in sp-TALP. D) The fourth aliquot was subjected to washing by centrifugation with sp-TALP. The percentage of motility increased for Percoll, swim up and washing than control (86.0, 73.0, and 66.5 vs. 56.5) respectively. Sperm abnormalities % was significantly decreased after Percoll, swim up and sperm wash separation methods. Spermatozoa obtained by Swim up and Percoll had the highest percentage of intact membrane. Different spermatozoa separation methods significantly increased the lytic activity of the recovered spermatozoa. Live spermatozoa percentage with reacted acrosome significantly increased after both swim up separation and washing. The percentage of dead spermatozoa with reacted acrosome significantly decreased after percoll separation but it did not change when the swim up method was used. Finally it is concluded that, density gradient centrifugation using PureSperm® could be considered as the method of choice for selection of frozen thawed buffalo spermatozoa and presumably with a high potential fertilizing ability. density gradient centrifugation using PureSperm® could be considered as the method of choice for selection of frozen thawed buffalo spermatozoa
- …