1,429 research outputs found
Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option
Abstract Data extracted from microarrays are now considered an important source of knowledge about various diseases. Several studies based on microarray data and the use of receiver operating characteristics (ROC) graphs have compared supervised machine learning approaches. These comparisons are based on classification schemes in which all samples are classified, regardless of the degree of confidence associated with the classification of a particular sample on the basis of a given classifier. In the domain of healthcare, it is safer to refrain from classifying a sample if the confidence assigned to the classification is not high enough, rather than classifying all samples even if confidence is low. We describe an approach in which the performance of different classifiers is compared, with the possibility of rejection, based on several reject areas. Using a tradeoff between accuracy and rejection, we propose the use of accuracy-rejection curves (ARCs) and three types of relationship between ARCs for comparisons of the ARCs of two classifiers. Empirical results based on purely synthetic data, semi-synthetic data (generated from real data obtained from patients) and public microarray data for binary classification problems demonstrate the efficacy of this method
Precision and Recall Reject Curves for Classification
For some classification scenarios, it is desirable to use only those
classification instances that a trained model associates with a high certainty.
To obtain such high-certainty instances, previous work has proposed
accuracy-reject curves. Reject curves allow to evaluate and compare the
performance of different certainty measures over a range of thresholds for
accepting or rejecting classifications. However, the accuracy may not be the
most suited evaluation metric for all applications, and instead precision or
recall may be preferable. This is the case, for example, for data with
imbalanced class distributions. We therefore propose reject curves that
evaluate precision and recall, the recall-reject curve and the precision-reject
curve. Using prototype-based classifiers from learning vector quantization, we
first validate the proposed curves on artificial benchmark data against the
accuracy reject curve as a baseline. We then show on imbalanced benchmarks and
medical, real-world data that for these scenarios, the proposed precision- and
recall-curves yield more accurate insights into classifier performance than
accuracy reject curves.Comment: 11 pages, 3 figures. Updated figure label
Uncertainty-Based Rejection in Machine Learning: Implications for Model Development and Interpretability
POCI-01-0247-FEDER-033479Uncertainty is present in every single prediction of Machine Learning (ML) models. Uncertainty Quantification (UQ) is arguably relevant, in particular for safety-critical applications. Prior research focused on the development of methods to quantify uncertainty; however, less attention has been given to how to leverage the knowledge of uncertainty in the process of model development. This work focused on applying UQ into practice, closing the gap of its utility in the ML pipeline and giving insights into how UQ is used to improve model development and its interpretability. We identified three main research questions: (1) How can UQ contribute to choosing the most suitable model for a given classification task? (2) Can UQ be used to combine different models in a principled manner? (3) Can visualization techniques improve UQ’s interpretability? These questions are answered by applying several methods to quantify uncertainty in both a simulated dataset and a real-world dataset of Human Activity Recognition (HAR). Our results showed that uncertainty quantification can increase model robustness and interpretability.publishersversionpublishe
IDPS Signature Classification with a Reject Option and the Incorporation of Expert Knowledge
As the importance of intrusion detection and prevention systems (IDPSs)
increases, great costs are incurred to manage the signatures that are generated
by malicious communication pattern files. Experts in network security need to
classify signatures by importance for an IDPS to work. We propose and evaluate
a machine learning signature classification model with a reject option (RO) to
reduce the cost of setting up an IDPS. To train the proposed model, it is
essential to design features that are effective for signature classification.
Experts classify signatures with predefined if-then rules. An if-then rule
returns a label of low, medium, high, or unknown importance based on keyword
matching of the elements in the signature. Therefore, we first design two types
of features, symbolic features (SFs) and keyword features (KFs), which are used
in keyword matching for the if-then rules. Next, we design web information and
message features (WMFs) to capture the properties of signatures that do not
match the if-then rules. The WMFs are extracted as term frequency-inverse
document frequency (TF-IDF) features of the message text in the signatures. The
features are obtained by web scraping from the referenced external attack
identification systems described in the signature. Because failure needs to be
minimized in the classification of IDPS signatures, as in the medical field, we
consider introducing a RO in our proposed model. The effectiveness of the
proposed classification model is evaluated in experiments with two real
datasets composed of signatures labeled by experts: a dataset that can be
classified with if-then rules and a dataset with elements that do not match an
if-then rule. In the experiment, the proposed model is evaluated. In both
cases, the combined SFs and WMFs performed better than the combined SFs and
KFs. In addition, we also performed feature analysis.Comment: 9 pages, 5 figures, 3 table
On the Feature Selection Methods and Reject Option Classifiers for Robust Cancer Prediction
Cancer is the second leading cause of mortality across the globe. Approximately 9.6 million people are estimated to have died due to cancer disease in 2019. Accurate and early prediction of cancer can assist healthcare professionals to devise timely therapeutic innervations to control sufferings and the risk of mortality. Generally, a machine learning (ML) based predictive system in healthcare uses data (genetic profile or clinical parameters) and learning algorithms to predict target values for cancer detection. However, optimization of predictive accuracy is an important endeavor for accurate decision making. Reject Option (RO) classifiers have been used to improve the predictive accuracy of classifiers for cancer like complex problems. In a gene profile all of the features are not important and should be shaved off. ML offers different techniques with their own methodology for feature selection (FS) and the classification results are dependent on the datasets each having its own distribution and features. Therefore, both FS methods and ML algorithms with RO need to be considered for robust classification. The main objective of this study is to optimize three parameters (learning algorithm, FS method and rejection rate) for robust cancer prediction rather than considering two traditional parameters (learning algorithm and rejection rate). The analysis of different FS methods (including t-Test, Las Vegas Filter (LVF), Relief, and Information Gain (IG)) and RO classifiers on different rejection thresholds is performed to investigate the robust predictability of cancer. The three cancer datasets (Colon cancer, Leukemia and Breast cancer) were reduced using different FS methods and each of them were used to analyze the predictability of cancer using different RO classifiers. The results reveal that for each dataset predictive accuracies of RO classifiers were different for different FS methods. The findings based on proposed scheme indicate that, the ML algorithms along with their dependence on suitable FS methods need to be taken into consideration for accurate prediction
Towards Knowledge Uncertainty Estimation for Open Set Recognition
POCI-01-0247-FEDER-033479Uncertainty is ubiquitous and happens in every single prediction of Machine Learning models. The ability to estimate and quantify the uncertainty of individual predictions is arguably relevant, all the more in safety-critical applications. Real-world recognition poses multiple challenges since a model's knowledge about physical phenomenon is not complete, and observations are incomplete by definition. However, Machine Learning algorithms often assume that train and test data distributions are the same and that all testing classes are present during training. A more realistic scenario is the Open Set Recognition, where unknown classes can be submitted to an algorithm during testing. In this paper, we propose a Knowledge Uncertainty Estimation (KUE) method to quantify knowledge uncertainty and reject out-of-distribution inputs. Additionally, we quantify and distinguish aleatoric and epistemic uncertainty with the classical information-theoretical measures of entropy by means of ensemble techniques. We performed experiments on four datasets with different data modalities and compared our results with distance-based classifiers, SVM-based approaches and ensemble techniques using entropy measures. Overall, the effectiveness of KUE in distinguishing in- and out-distribution inputs obtained better results in most cases and was at least comparable in others. Furthermore, a classification with rejection option based on a proposed combination strategy between different measures of uncertainty is an application of uncertainty with proven results.publishersversionpublishe
Focusing on the Big Picture: Insights into a Systems Approach to Deep Learning for Satellite Imagery
Deep learning tasks are often complicated and require a variety of components
working together efficiently to perform well. Due to the often large scale of
these tasks, there is a necessity to iterate quickly in order to attempt a
variety of methods and to find and fix bugs. While participating in IARPA's
Functional Map of the World challenge, we identified challenges along the
entire deep learning pipeline and found various solutions to these challenges.
In this paper, we present the performance, engineering, and deep learning
considerations with processing and modeling data, as well as underlying
infrastructure considerations that support large-scale deep learning tasks. We
also discuss insights and observations with regard to satellite imagery and
deep learning for image classification.Comment: Accepted to IEEE Big Data 201
Consistency of plug-in confidence sets for classification in semi-supervised learning
Confident prediction is highly relevant in machine learning; for example, in
applications such as medical diagnoses, wrong prediction can be fatal. For
classification, there already exist procedures that allow to not classify data
when the confidence in their prediction is weak. This approach is known as
classification with reject option. In the present paper, we provide new
methodology for this approach. Predicting a new instance via a confidence set,
we ensure an exact control of the probability of classification. Moreover, we
show that this methodology is easily implementable and entails attractive
theoretical and numerical properties
How do you feel? Measuring User-Perceived Value for Rejecting Machine Decisions in Hate Speech Detection
Hate speech moderation remains a challenging task for social media platforms.
Human-AI collaborative systems offer the potential to combine the strengths of
humans' reliability and the scalability of machine learning to tackle this
issue effectively. While methods for task handover in human-AI collaboration
exist that consider the costs of incorrect predictions, insufficient attention
has been paid to accurately estimating these costs. In this work, we propose a
value-sensitive rejection mechanism that automatically rejects machine
decisions for human moderation based on users' value perceptions regarding
machine decisions. We conduct a crowdsourced survey study with 160 participants
to evaluate their perception of correct and incorrect machine decisions in the
domain of hate speech detection, as well as occurrences where the system
rejects making a prediction. Here, we introduce Magnitude Estimation, an
unbounded scale, as the preferred method for measuring user (dis)agreement with
machine decisions. Our results show that Magnitude Estimation can provide a
reliable measurement of participants' perception of machine decisions. By
integrating user-perceived value into human-AI collaboration, we further show
that it can guide us in 1) determining when to accept or reject machine
decisions to obtain the optimal total value a model can deliver and 2)
selecting better classification models as compared to the more widely used
target of model accuracy.Comment: To appear at AIES '23. Philippe Lammerts, Philip Lippmann, Yen-Chia
Hsu, Fabio Casati, and Jie Yang. 2023. How do you feel? Measuring
User-Perceived Value for Rejecting Machine Decisions in Hate Speech
Detection. In AAAI/ACM Conference on AI, Ethics, and Society (AIES '23),
August 8.10, 2023, Montreal, QC, Canada. ACM, New York, NY, USA. 11 page
- …