1,136 research outputs found
A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling
Learning with imbalanced data is one of the emergent challenging tasks in machine learning. Recently, ensemble learning has arisen as an effective solution to class imbalance problems. The combination of bagging and boosting with data preprocessing resampling, namely, the simplest and accurate exploratory undersampling, has become the most popular method for imbalanced data classification. In this paper, we propose a novel selective ensemble construction method based on exploratory undersampling, RotEasy, with the advantage of improving storage requirement and computational efficiency by ensemble pruning technology. Our methodology aims to enhance the diversity between individual classifiers through feature extraction and diversity regularized ensemble pruning. We made a comprehensive comparison between our method and some state-of-the-art imbalanced learning methods. Experimental results on 20 real-world imbalanced data sets show that RotEasy possesses a significant increase in performance, contrasted by a nonparametric statistical test and various evaluation criteria
Exploiting diversity for optimizing margin distribution in ensemble learning
Margin distribution is acknowledged as an important factor for improving the generalization performance of classifiers. In this paper, we propose a novel ensemble learning algorithm named Double Rotation Margin Forest (DRMF), that aims to improve the margin distribution of the combined system over the training set. We utilise random rotation to produce diverse base classifiers, and optimize the margin distribution to exploit the diversity for producing an optimal ensemble. We demonstrate that diverse base classifiers are beneficial in deriving large-margin ensembles, and that therefore our proposed technique will lead to good generalization performance. We examine our method on an extensive set of benchmark classification tasks. The experimental results confirm that DRMF outperforms other classical ensemble algorithms such as Bagging, AdaBoostM1 and Rotation Forest. The success of DRMF is explained from the viewpoints of margin distribution and diversity
Gene set based ensemble methods for cancer classification
Diagnosis of cancer very often depends on conclusions drawn after both clinical and microscopic examinations of tissues to study the manifestation of the disease in order to place tumors in known categories. One factor which determines the categorization of cancer is the tissue from which the tumor originates. Information gathered from clinical exams may be partial or not completely predictive of a specific category of cancer. Further complicating the problem of categorizing various tumors is that the histological classification of the cancer tissue and description of its course of development may be atypical. Gene expression data gleaned from micro-array analysis provides tremendous promise for more accurate cancer diagnosis. One hurdle in the classification of tumors based on gene expression data is that the data space is ultra-dimensional with relatively few points; that is, there are a small number of examples with a large number of genes. A second hurdle is expression bias caused by the correlation of genes. Analysis of subsets of genes, known as gene set analysis, provides a mechanism by which groups of differentially expressed genes can be identified. We propose an ensemble of classifiers whose base classifiers are â„“1-regularized logistic regression models with restriction of the feature space to biologically relevant genes. Some researchers have already explored the use of ensemble classifiers to classify cancer but the effect of the underlying base classifiers in conjunction with biologically-derived gene sets on cancer classification has not been explored
Active Collaborative Ensemble Tracking
A discriminative ensemble tracker employs multiple classifiers, each of which
casts a vote on all of the obtained samples. The votes are then aggregated in
an attempt to localize the target object. Such method relies on collective
competence and the diversity of the ensemble to approach the target/non-target
classification task from different views. However, by updating all of the
ensemble using a shared set of samples and their final labels, such diversity
is lost or reduced to the diversity provided by the underlying features or
internal classifiers' dynamics. Additionally, the classifiers do not exchange
information with each other while striving to serve the collective goal, i.e.,
better classification. In this study, we propose an active collaborative
information exchange scheme for ensemble tracking. This, not only orchestrates
different classifier towards a common goal but also provides an intelligent
update mechanism to keep the diversity of classifiers and to mitigate the
shortcomings of one with the others. The data exchange is optimized with regard
to an ensemble uncertainty utility function, and the ensemble is updated via
co-training. The evaluations demonstrate promising results realized by the
proposed algorithm for the real-world online tracking.Comment: AVSS 2017 Submissio
- …