909 research outputs found
Convex Optimization for Binary Classifier Aggregation in Multiclass Problems
Multiclass problems are often decomposed into multiple binary problems that
are solved by individual binary classifiers whose results are integrated into a
final answer. Various methods, including all-pairs (APs), one-versus-all (OVA),
and error correcting output code (ECOC), have been studied, to decompose
multiclass problems into binary problems. However, little study has been made
to optimally aggregate binary problems to determine a final answer to the
multiclass problem. In this paper we present a convex optimization method for
an optimal aggregation of binary classifiers to estimate class membership
probabilities in multiclass problems. We model the class membership probability
as a softmax function which takes a conic combination of discrepancies induced
by individual binary classifiers, as an input. With this model, we formulate
the regularized maximum likelihood estimation as a convex optimization problem,
which is solved by the primal-dual interior point method. Connections of our
method to large margin classifiers are presented, showing that the large margin
formulation can be considered as a limiting case of our convex formulation.
Numerical experiments on synthetic and real-world data sets demonstrate that
our method outperforms existing aggregation methods as well as direct methods,
in terms of the classification accuracy and the quality of class membership
probability estimates.Comment: Appeared in Proceedings of the 2014 SIAM International Conference on
Data Mining (SDM 2014
Radar-based Road User Classification and Novelty Detection with Recurrent Neural Network Ensembles
Radar-based road user classification is an important yet still challenging
task towards autonomous driving applications. The resolution of conventional
automotive radar sensors results in a sparse data representation which is tough
to recover by subsequent signal processing. In this article, classifier
ensembles originating from a one-vs-one binarization paradigm are enriched by
one-vs-all correction classifiers. They are utilized to efficiently classify
individual traffic participants and also identify hidden object classes which
have not been presented to the classifiers during training. For each classifier
of the ensemble an individual feature set is determined from a total set of 98
features. Thereby, the overall classification performance can be improved when
compared to previous methods and, additionally, novel classes can be identified
much more accurately. Furthermore, the proposed structure allows to give new
insights in the importance of features for the recognition of individual
classes which is crucial for the development of new algorithms and sensor
requirements.Comment: 8 pages, 9 figures, accepted paper for 2019 IEEE Intelligent Vehicles
Symposium (IV), Paris, France, June 201
Uncertainty-Aware Estimation of Population Abundance using Machine Learning
Machine Learning is widely used for mining collections, such as images, sounds, or texts, by classifying their elements into categories. Automatic classication based on supervised learning requires groundtruth datasets for modeling the elements to classify, and for testing the quality of the classication. Because collecting groundtruth is tedious, a method for estimating the potential errors in large datasets based on limited groundtruth is ne
Doubly Optimized Calibrated Support Vector Machine (DOC-SVM): an algorithm for joint optimization of discrimination and calibration.
Historically, probabilistic models for decision support have focused on discrimination, e.g., minimizing the ranking error of predicted outcomes. Unfortunately, these models ignore another important aspect, calibration, which indicates the magnitude of correctness of model predictions. Using discrimination and calibration simultaneously can be helpful for many clinical decisions. We investigated tradeoffs between these goals, and developed a unified maximum-margin method to handle them jointly. Our approach called, Doubly Optimized Calibrated Support Vector Machine (DOC-SVM), concurrently optimizes two loss functions: the ridge regression loss and the hinge loss. Experiments using three breast cancer gene-expression datasets (i.e., GSE2034, GSE2990, and Chanrion's datasets) showed that our model generated more calibrated outputs when compared to other state-of-the-art models like Support Vector Machine (p=0.03, p=0.13, and p<0.001) and Logistic Regression (p=0.006, p=0.008, and p<0.001). DOC-SVM also demonstrated better discrimination (i.e., higher AUCs) when compared to Support Vector Machine (p=0.38, p=0.29, and p=0.047) and Logistic Regression (p=0.38, p=0.04, and p<0.0001). DOC-SVM produced a model that was better calibrated without sacrificing discrimination, and hence may be helpful in clinical decision making
A direct ensemble classifier for imbalanced multiclass learning
Researchers have shown that although traditional direct classifier algorithm can be easily applied to multiclass classification, the performance of a single classifier is decreased with the existence of imbalance data in multiclass classification tasks.Thus, ensemble of classifiers has emerged as one of the hot topics in multiclass classification tasks for imbalance problem for data mining and machine learning domain.Ensemble learning is an effective technique that has increasingly been adopted to combine multiple learning algorithms to improve overall prediction accuraciesand may outperform any single sophisticated classifiers.In this paper, an ensemble learner called a Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) that combines simple nearest neighbour and Naive Bayes algorithms is proposed. A combiner method called OR-tree is used to combine the decisions obtained from the ensemble classifiers.The DECIML framework has been tested with several benchmark dataset and shows promising results
Evaluating classification accuracy for modern learning approaches
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149333/1/sim8103_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149333/2/sim8103.pd
Uncertainty-aware estimation of population abundance using machine learning
Machine Learning is widely used for mining collections, such as images, sounds, or texts, by classifying their elements into categories. Automatic classification based on supervised learning requires groundtruth datasets for modeling the elements to classify, and for testing the quality of the classification. Because collecting groundtruth is tedious, a method for estimating the potential errors in large datasets based on limited groundtruth is needed. We propose a method that improves classification quality by using limited groundtruth data to extrapolate the po-tential errors in larger datasets. It significantly improves the counting of elements per class. We further propose visualization designs for understanding and evaluating the classification un-certainty. They support end-users in considering the impact of potential misclassifications for interpreting the classification output. This work was developed to address the needs of ecologists studying fish population abundance using computer vision, but generalizes to a larger range of applications. Our method is largely applicable for a variety of Machine Learning technologies, and our visualizations further support their transfer to end-users
Probabilistic multiple kernel learning
The integration of multiple and possibly heterogeneous information sources for an overall decision-making process has been an open and unresolved research direction in computing science since its very beginning. This thesis attempts to address parts of that direction by proposing probabilistic data integration algorithms for multiclass decisions where an observation of interest is assigned to one of many categories based on a plurality of information channels
- …