3,840 research outputs found

    Two-Stage Bagging Pruning for Reducing the Ensemble Size and Improving the Classification Performance

    Get PDF
    Ensemble methods, such as the traditional bagging algorithm, can usually improve the performance of a single classifier. However, they usually require large storage space as well as relatively time-consuming predictions. Many approaches were developed to reduce the ensemble size and improve the classification performance by pruning the traditional bagging algorithms. In this article, we proposed a two-stage strategy to prune the traditional bagging algorithm by combining two simple approaches: accuracy-based pruning (AP) and distance-based pruning (DP). These two methods, as well as their two combinations, “AP+DP” and “DP+AP” as the two-stage pruning strategy, were all examined. Comparing with the single pruning methods, we found that the two-stage pruning methods can furthermore reduce the ensemble size and improve the classification. “AP+DP” method generally performs better than the “DP+AP” method when using four base classifiers: decision tree, Gaussian naive Bayes, K-nearest neighbor, and logistic regression. Moreover, as compared to the traditional bagging, the two-stage method “AP+DP” improved the classification accuracy by 0.88%, 4.06%, 1.26%, and 0.96%, respectively, averaged over 28 datasets under the four base classifiers. It was also observed that “AP+DP” outperformed other three existing algorithms Brag, Nice, and TB assessed on 8 common datasets. In summary, the proposed two-stage pruning methods are simple and promising approaches, which can both reduce the ensemble size and improve the classification accuracy

    A multi-objective optimization approach for the synthesis of granular computing-based classification systems in the graph domain

    Get PDF
    The synthesis of a pattern recognition system usually aims at the optimization of a given performance index. However, in many real-world scenarios, there exist other desired facets to take into account. In this regard, multi-objective optimization acts as the main tool for the optimization of different (and possibly conflicting) objective functions in order to seek for potential trade-offs among them. In this paper, we propose a three-objective optimization problem for the synthesis of a granular computing-based pattern recognition system in the graph domain. The core pattern recognition engine searches for suitable information granules (i.e., recurrent and/or meaningful subgraphs from the training data) on the top of which the graph embedding procedure towards the Euclidean space is performed. In the latter, any classification system can be employed. The optimization problem aims at jointly optimizing the performance of the classifier, the number of information granules and the structural complexity of the classification model. Furthermore, we address the problem of selecting a suitable number of solutions from the resulting Pareto Fronts in order to compose an ensemble of classifiers to be tested on previously unseen data. To perform such selection, we employed a multi-criteria decision making routine by analyzing different case studies that differ on how much each objective function weights in the ranking process. Results on five open-access datasets of fully labeled graphs show that exploiting the ensemble is effective (especially when the structural complexity of the model plays a minor role in the decision making process) if compared against the baseline solution that solely aims at maximizing the performances

    Ensemble machine learning approach for electronic nose signal processing

    Get PDF
    Electronic nose (e-nose) systems have been reported to be used in many areas as rapid, low- cost, and non-invasive instruments. Especially in meat production and processing, e-nose system is a powerful tool to process volatile compounds as a unique ‘fingerprint’. The ability of the pattern recognition algorithm to analyze e-nose signals is the key to the success of the e-nose system in many applications. On the other hand, ensemble methods have been reported for favorable performances in various data sets. This research proposes an ensemble learning approach for e-nose signal processing, especially in beef quality assessment. Ensemble methods are not only used for learning algorithms but also sensor array optimization. For sensor array optimization, three filter-based feature selection algorithms (FSAs) are used to build ensemble FSA such as reliefF, chi-square, and gini index. Ensemble FSA is developed to deal with different or unstable outputs of a single FSA on homogeneous e-nose data sets in beef quality monitoring. Moreover, ensemble learning algorithms are employed to deal with multi-class classification and regression tasks. Random forest and Adaboost are used that represent bagging and boosting algorithms, respectively. The results are also compared with support vector machine and decision tree as single learners. According to the experimental results, our ensemble approach has good performance and generalization in e-nose signal processing. Optimized sensor combination based on filter-based FSA shows stable results both in classification and regression tasks. Furthermore, Adaboost as a boosting algorithm produces the best prediction even though using a smaller number of sensor
    corecore