7 research outputs found

    Generalized FLIC: Learning with Misclassification for Binary Classifiers

    Get PDF
    This work formally introduces a generalized fuzzy logic and interval clustering (FLIC) technique which, when integrated with existing supervised learning algorithms, improves their performance. FLIC is a method that was first integrated with neural network in order to improve neural network's performance in drug discovery using high throughput screening (HTS). This research strictly focuses on binary classification problems and generalizes the FLIC in order to incorporate it with other machine learning algorithms. In most binary classification problems, the class boundary is not linear. This pose a major problem when the number of outliers are significantly high, degrading the performance of the supervised learning function. FLIC identifies these misclassifications before the training set is introduced to the learning algorithm. This allows the supervised learning algorithm to learn more efficiently since it is now aware of those misclassifications. Although the proposed method performs well with most binary classification problems, it does significantly well for data set with high class asymmetry. The proposed method has been tested on four well known data sets of which three are from UCI Machine Learning repository and one from BigML. Tests have been conducted with three well known supervised learning techniques: Decision Tree, Logistic Regression and Naive Bayes. The results from the experiments show significant improvement in performance. The paper begins with a formal introduction to the core idea this research is based upon. It then discusses a list of other methods that have either inspired this research or have been referred to, in order to formalize the techniques. Subsequent sections discuss the methodology and the algorithm which is followed by results and conclusion

    Semi-Supervised Learning with Explicit Misclassification Modeling

    No full text
    International audienceThis paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeled-unlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the label-error model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process

    Semi-Supervised Learning with Explicit Misclassification Modeling

    No full text
    This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeledunlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the labelerror model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.

    Semi-Supervised Learning with Explicit Misclassification Modeling

    No full text
    This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeledunlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the labelerror model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.
    corecore