195 research outputs found
Exploring the Hidden Challenges Associated with the Evaluation of Multi-class Datasets using Multiple Classifiers
The optimization and evaluation of a pattern recognition system requires different problems like multi-class and imbalanced datasets be addressed. This paper presents the classification of multi-class datasets which present more challenges when compare to binary class datasets in machine learning. Furthermore, it argues that the performance evaluation of a classification model for multi-class imbalanced datasets in terms of simple “accuracy rate” can possibly provide misleading results. Other parameters such as failure avoidance, true identification of positive and negative instances of a class and class discrimination are also very important. We, in this paper, hypothesize that “misclassification of true positive patterns should not necessarily be categorized as false negative while evaluating a classifier for multi-class datasets”, a common practice that has been observed in the existing literature. In order to address these hidden challenges for the generalization of a particular classifier, several evaluation metrics are compared for a multi-class dataset with four classes; three of them belong to different neurodegenerative diseases and one to control subjects. Three classifiers, linear discriminant, quadratic discriminant and Parzen are selected to demonstrate the results with examples
A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition
Physical activity is widely known to be one of the key elements of a healthy life. The many benefits of physical activity described in the medical literature include weight loss and reductions in the risk factors for chronic diseases. With the recent advances in wearable devices, such as smartwatches or physical activity wristbands, motion tracking sensors are becoming pervasive, which has led to an impressive growth in the amount of physical activity data available and an increasing interest in recognizing which specific activity a user is performing. Moreover, big data and machine learning are now cross-fertilizing each other in an approach called "deep learning", which consists of massive artificial neural networks able to detect complicated patterns from enormous amounts of input data to learn classification models. This work compares various state-of-the-art classification techniques for automatic cross-person activity recognition under different scenarios that vary widely in how much information is available for analysis. We have incorporated deep learning by using Google's TensorFlow framework. The data used in this study were acquired from PAMAP2 (Physical Activity Monitoring in the Ageing Population), a publicly available dataset containing physical activity data. To perform cross-person prediction, we used the leave-one-subject-out (LOSO) cross-validation technique. When working with large training sets, the best classifiers obtain very high average accuracies (e.g., 96% using extra randomized trees). However, when the data volume is drastically reduced (where available data are only 0.001% of the continuous data), deep neural networks performed the best, achieving 60% in overall prediction accuracy. We found that even when working with only approximately 22.67% of the full dataset, we can statistically obtain the same results as when working with the full dataset.This project was partially funded by the European Union’s CIP (Competitiveness and Innovation Framework Programme) (ICT-PSP-2012) under Grant Agreement No. 325146 (Social Ecosystem for Antiaging, Capacitation and Wellbeing—SEACW project). It is also supported by the Spanish Ministry of Education, Culture and Sport through the FPU (University Faculty Training) fellowship (FPU13/03917).Publicad
The problem with Kappa
It is becoming clear that traditional
evaluation measures used in
Computational Linguistics (including
Error Rates, Accuracy, Recall, Precision
and F-measure) are of limited value for
unbiased evaluation of systems, and are
not meaningful for comparison of
algorithms unless both the dataset and
algorithm parameters are strictly
controlled for skew (Prevalence and
Bias). The use of techniques originally
designed for other purposes, in particular
Receiver Operating Characteristics Area
Under Curve, plus variants of Kappa,
have been proposed to fill the void.
This paper aims to clear up some of the
confusion relating to evaluation, by
demonstrating that the usefulness of each
evaluation method is highly dependent on
the assumptions made about the
distributions of the dataset and the
underlying populations. The behaviour of
a number of evaluation measures is
compared under common assumptions.
Deploying a system in a context which
has the opposite skew from its validation
set can be expected to approximately
negate Fleiss Kappa and halve Cohen
Kappa but leave Powers Kappa
unchanged. For most performance
evaluation purposes, the latter is thus
most appropriate, whilst for comparison
of behaviour, Matthews Correlation is
recommended
- …