909 research outputs found

    Convex Optimization for Binary Classifier Aggregation in Multiclass Problems

    Full text link
    Multiclass problems are often decomposed into multiple binary problems that are solved by individual binary classifiers whose results are integrated into a final answer. Various methods, including all-pairs (APs), one-versus-all (OVA), and error correcting output code (ECOC), have been studied, to decompose multiclass problems into binary problems. However, little study has been made to optimally aggregate binary problems to determine a final answer to the multiclass problem. In this paper we present a convex optimization method for an optimal aggregation of binary classifiers to estimate class membership probabilities in multiclass problems. We model the class membership probability as a softmax function which takes a conic combination of discrepancies induced by individual binary classifiers, as an input. With this model, we formulate the regularized maximum likelihood estimation as a convex optimization problem, which is solved by the primal-dual interior point method. Connections of our method to large margin classifiers are presented, showing that the large margin formulation can be considered as a limiting case of our convex formulation. Numerical experiments on synthetic and real-world data sets demonstrate that our method outperforms existing aggregation methods as well as direct methods, in terms of the classification accuracy and the quality of class membership probability estimates.Comment: Appeared in Proceedings of the 2014 SIAM International Conference on Data Mining (SDM 2014

    Radar-based Road User Classification and Novelty Detection with Recurrent Neural Network Ensembles

    Full text link
    Radar-based road user classification is an important yet still challenging task towards autonomous driving applications. The resolution of conventional automotive radar sensors results in a sparse data representation which is tough to recover by subsequent signal processing. In this article, classifier ensembles originating from a one-vs-one binarization paradigm are enriched by one-vs-all correction classifiers. They are utilized to efficiently classify individual traffic participants and also identify hidden object classes which have not been presented to the classifiers during training. For each classifier of the ensemble an individual feature set is determined from a total set of 98 features. Thereby, the overall classification performance can be improved when compared to previous methods and, additionally, novel classes can be identified much more accurately. Furthermore, the proposed structure allows to give new insights in the importance of features for the recognition of individual classes which is crucial for the development of new algorithms and sensor requirements.Comment: 8 pages, 9 figures, accepted paper for 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, June 201

    Uncertainty-Aware Estimation of Population Abundance using Machine Learning

    Get PDF
    Machine Learning is widely used for mining collections, such as images, sounds, or texts, by classifying their elements into categories. Automatic classication based on supervised learning requires groundtruth datasets for modeling the elements to classify, and for testing the quality of the classication. Because collecting groundtruth is tedious, a method for estimating the potential errors in large datasets based on limited groundtruth is ne

    Doubly Optimized Calibrated Support Vector Machine (DOC-SVM): an algorithm for joint optimization of discrimination and calibration.

    Get PDF
    Historically, probabilistic models for decision support have focused on discrimination, e.g., minimizing the ranking error of predicted outcomes. Unfortunately, these models ignore another important aspect, calibration, which indicates the magnitude of correctness of model predictions. Using discrimination and calibration simultaneously can be helpful for many clinical decisions. We investigated tradeoffs between these goals, and developed a unified maximum-margin method to handle them jointly. Our approach called, Doubly Optimized Calibrated Support Vector Machine (DOC-SVM), concurrently optimizes two loss functions: the ridge regression loss and the hinge loss. Experiments using three breast cancer gene-expression datasets (i.e., GSE2034, GSE2990, and Chanrion's datasets) showed that our model generated more calibrated outputs when compared to other state-of-the-art models like Support Vector Machine (p=0.03, p=0.13, and p<0.001) and Logistic Regression (p=0.006, p=0.008, and p<0.001). DOC-SVM also demonstrated better discrimination (i.e., higher AUCs) when compared to Support Vector Machine (p=0.38, p=0.29, and p=0.047) and Logistic Regression (p=0.38, p=0.04, and p<0.0001). DOC-SVM produced a model that was better calibrated without sacrificing discrimination, and hence may be helpful in clinical decision making

    A direct ensemble classifier for imbalanced multiclass learning

    Get PDF
    Researchers have shown that although traditional direct classifier algorithm can be easily applied to multiclass classification, the performance of a single classifier is decreased with the existence of imbalance data in multiclass classification tasks.Thus, ensemble of classifiers has emerged as one of the hot topics in multiclass classification tasks for imbalance problem for data mining and machine learning domain.Ensemble learning is an effective technique that has increasingly been adopted to combine multiple learning algorithms to improve overall prediction accuraciesand may outperform any single sophisticated classifiers.In this paper, an ensemble learner called a Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) that combines simple nearest neighbour and Naive Bayes algorithms is proposed. A combiner method called OR-tree is used to combine the decisions obtained from the ensemble classifiers.The DECIML framework has been tested with several benchmark dataset and shows promising results

    Evaluating classification accuracy for modern learning approaches

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149333/1/sim8103_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149333/2/sim8103.pd

    Uncertainty-aware estimation of population abundance using machine learning

    Get PDF
    Machine Learning is widely used for mining collections, such as images, sounds, or texts, by classifying their elements into categories. Automatic classification based on supervised learning requires groundtruth datasets for modeling the elements to classify, and for testing the quality of the classification. Because collecting groundtruth is tedious, a method for estimating the potential errors in large datasets based on limited groundtruth is needed. We propose a method that improves classification quality by using limited groundtruth data to extrapolate the po-tential errors in larger datasets. It significantly improves the counting of elements per class. We further propose visualization designs for understanding and evaluating the classification un-certainty. They support end-users in considering the impact of potential misclassifications for interpreting the classification output. This work was developed to address the needs of ecologists studying fish population abundance using computer vision, but generalizes to a larger range of applications. Our method is largely applicable for a variety of Machine Learning technologies, and our visualizations further support their transfer to end-users

    Probabilistic multiple kernel learning

    Get PDF
    The integration of multiple and possibly heterogeneous information sources for an overall decision-making process has been an open and unresolved research direction in computing science since its very beginning. This thesis attempts to address parts of that direction by proposing probabilistic data integration algorithms for multiclass decisions where an observation of interest is assigned to one of many categories based on a plurality of information channels
    • …
    corecore