76 research outputs found
A homogeneous-heterogeneous ensemble of classifiers.
In this study, we introduce an ensemble system by combining homogeneous ensemble and heterogeneous ensemble into a single framework. Based on the observation that the projected data is significantly different from the original data as well as each other after using random projections, we construct the homogeneous module by applying random projections on the training data to obtain the new training sets. In the heterogeneous module, several learning algorithms will train on the new training sets to generate the base classifiers. We propose four combining algorithms based on Sum Rule and Majority Vote Rule for the proposed ensemble. Experiments on some popular datasets confirm that the proposed ensemble method is better than several well-known benchmark algorithms proposed framework has great flexibility when applied to real-world applications. The proposed framework has great flexibility when applied to real-world applications by using any techniques that make rich training data for the homogeneous module, as well as using any set of learning algorithms for the heterogeneous module
DEFEG: deep ensemble with weighted feature generation.
With the significant breakthrough of Deep Neural Networks in recent years, multi-layer architecture has influenced other sub-fields of machine learning including ensemble learning. In 2017, Zhou and Feng introduced a deep random forest called gcForest that involves several layers of Random Forest-based classifiers. Although gcForest has outperformed several benchmark algorithms on specific datasets in terms of classification accuracy and model complexity, its input features do not ensure better performance when going deeply through layer-by-layer architecture. We address this limitation by introducing a deep ensemble model with a novel feature generation module. Unlike gcForest where the original features are concatenated to the outputs of classifiers to generate the input features for the subsequent layer, we integrate weights on the classifiers’ outputs as augmented features to grow the deep model. The usage of weights in the feature generation process can adjust the input data of each layer, leading the better results for the deep model. We encode the weights using variable-length encoding and develop a variable-length Particle Swarm Optimisation method to search for the optimal values of the weights by maximizing the classification accuracy on the validation data. Experiments on a number of UCI datasets confirm the benefit of the proposed method compared to some well-known benchmark algorithms
Heterogeneous ensemble selection for evolving data streams.
Ensemble learning has been widely applied to both batch data classification and streaming data classification. For the latter setting, most existing ensemble systems are homogenous, which means they are generated from only one type of learning model. In contrast, by combining several types of different learning models, a heterogeneous ensemble system can achieve greater diversity among its members, which helps to improve its performance. Although heterogeneous ensemble systems have achieved many successes in the batch classification setting, it is not trivial to extend them directly to the data stream setting. In this study, we propose a novel HEterogeneous Ensemble Selection (HEES) method, which dynamically selects an appropriate subset of base classifiers to predict data under the stream setting. We are inspired by the observation that a well-chosen subset of good base classifiers may outperform the whole ensemble system. Here, we define a good candidate as one that expresses not only high predictive performance but also high confidence in its prediction. Our selection process is thus divided into two sub-processes: accurate-candidate selection and confident-candidate selection. We define an accurate candidate in the stream context as a base classifier with high accuracy over the current concept, while a confident candidate as one with a confidence score higher than a certain threshold. In the first sub-process, we employ the prequential accuracy to estimate the performance of a base classifier at a specific time, while in the latter sub-process, we propose a new measure to quantify the predictive confidence and provide a method to learn the threshold incrementally. The final ensemble is formed by taking the intersection of the sets of confident classifiers and accurate classifiers. Experiments on a wide range of data streams show that the proposed method achieves competitive performance with lower running time in comparison to the state-of-the-art online ensemble methods
Ensemble of deep learning models with surrogate-based optimization for medical image segmentation.
Deep Neural Networks (DNNs) have created a breakthrough in medical image analysis in recent years. Because clinical applications of automated medical analysis are required to be reliable, robust and accurate, it is necessary to devise effective DNNs based models for medical applications. In this paper, we propose an ensemble framework of DNNs for the problem of medical image segmentation with a note that combining multiple models can obtain better results compared to each constituent one. We introduce an effective combining strategy for individual segmentation models based on swarm intelligence, which is a family of optimization algorithms inspired by biological processes. The problem of expensive computational time of the optimizer during the objective function evaluation is relieved by using a surrogate-based method. We train a surrogate on the objective function information of some populations and then use it to predict the objective values of each candidate in the subsequent populations. Experiments run on a number of public datasets indicate that our framework achieves competitive results within reasonable computation time
Multi-label classification via incremental clustering on an evolving data stream.
With the advancement of storage and processing technology, an enormous amount of data is collected on a daily basis in many applications. Nowadays, advanced data analytics have been used to mine the collected data for useful information and make predictions, contributing to the competitive advantages of companies. The increasing data volume, however, has posed many problems to classical batch learning systems, such as the need to retrain the model completely with the newly arrived samples or the impracticality of storing and accessing a large volume of data. This has prompted interest on incremental learning that operates on data streams. In this study, we develop an incremental online multi-label classification (OMLC) method based on a weighted clustering model. The model is made to adapt to the change of data via the decay mechanism in which each sample's weight dwindles away over time. The clustering model therefore always focuses more on newly arrived samples. In the classification process, only clusters whose weights are greater than a threshold (called mature clusters) are employed to assign labels for the samples. In our method, not only is the clustering model incrementally maintained with the revealed ground truth labels of the arrived samples, the number of predicted labels in a sample are also adjusted based on the Hoeffding inequality and the label cardinality. The experimental results show that our method is competitive compared to several well-known benchmark algorithms on six performance measures in both the stationary and the concept drift settings
Multi-layer heterogeneous ensemble with classifier and feature selection.
Deep Neural Networks have achieved many successes when applying to visual, text, and speech information in various domains. The crucial reasons behind these successes are the multi-layer architecture and the in-model feature transformation of deep learning models. These design principles have inspired other sub-fields of machine learning including ensemble learning. In recent years, there are some deep homogenous ensemble models introduced with a large number of classifiers in each layer. These models, thus, require a costly computational classification. Moreover, the existing deep ensemble models use all classifiers including unnecessary ones which can reduce the predictive accuracy of the ensemble. In this study, we propose a multi-layer ensemble learning framework called MUlti-Layer heterogeneous Ensemble System (MULES) to solve the classification problem. The proposed system works with a small number of heterogeneous classifiers to obtain ensemble diversity, therefore being efficiency in resource usage. We also propose an Evolutionary Algorithm-based selection method to select the subset of suitable classifiers and features at each layer to enhance the predictive performance of MULES. The selection method uses NSGA-II algorithm to optimize two objectives concerning classification accuracy and ensemble diversity. Experiments on 33 datasets confirm that MULES is better than a number of well-known benchmark algorithms
Evolving interval-based representation for multiple classifier fusion.
Designing an ensemble of classifiers is one of the popular research topics in machine learning since it can give better results than using each constituent member. Furthermore, the performance of ensemble can be improved using selection or adaptation. In the former, the optimal set of base classifiers, meta-classifier, original features, or meta-data is selected to obtain a better ensemble than using the entire classifiers and features. In the latter, the base classifiers or combining algorithms working on the outputs of the base classifiers are made to adapt to a particular problem. The adaptation here means that the parameters of these algorithms are trained to be optimal for each problem. In this study, we propose a novel evolving combining algorithm using the adaptation approach for the ensemble systems. Instead of using numerical value when computing the representation for each class, we propose to use the interval-based representation for the class. The optimal value of the representation is found through Particle Swarm Optimization. During classification, a test instance is assigned to the class with the interval-based representation that is closest to the base classifiers’ prediction. Experiments conducted on a number of popular dataset confirmed that the proposed method is better than the well-known ensemble systems using Decision Template and Sum Rule as combiner, L2-loss Linear Support Vector Machine, Multiple Layer Neural Network, and the ensemble selection methods based on GA-Meta-data, META-DES, and ACO
VISTA: a variable length genetic algorithm and LSTM-based surrogate assisted ensemble selection algorithm in multiple layers ensemble system.
We proposed a novel ensemble selection method called VISTA for multiple layers ensemble systems (MLES). Our ensemble model consists of multiple layers of ensemble of classifiers (EoC) in which the EoC in each layer is trained on the data generated by a concatenation of the original training data and the predictions by classifiers of the previous layer. The predictions of the EoC in the final layer are aggregated to obtain the final prediction. To enhance the accuracy of the MLES, we used the Variable-Length Genetic Algorithm (VLGA) to search for the optimal configuration of EoC in each layer. Since the optimisation process is computationally intensive, we use Surrogate-Assisted Evolutionary Algorithms (SAEA) to reduce the training time. Most surrogate models developed in the literature require a fixed-length input, which limits their applications when the encoding is of variable length. In this paper, we proposed to use a Long Short-Term Memory (LSTM)-based surrogate model, in which the LSTM transforms the variable-length encoding to a fixed-size representation which will then be used by the surrogate model to predict the fitness values in VLGA. For the surrogate model, we adopted Radial Basis Function (RBF) for surrogation. We first conducted experiments in comparing two types of LSTM converters, and the results suggest that the proposed chunk-based LSTM converter provides better results compared to the normal LSTM converter. Our experiments on 15 datasets show that VISTA outperforms several benchmark algorithms
Deep heterogeneous ensemble.
In recent years, deep neural networks (DNNs) have emerged as a powerful technique in many areas of machine learning. Although DNNs have achieved great breakthrough in processing images, video, audio and text, it also has some limitations such as needing a large number of labeled data for training and having a large number of parameters. Ensemble learning, meanwhile, provides a learning model by combining many different classifiers such that an ensemble of classifiers is better than using single classifier. In this study, we propose a deep ensemble framework called Deep Heterogeneous Ensemble (DHE) for supervised learning tasks. In each layer of our algorithm, the input data is passed through a feature selection method to remove irrelevant features and prevent overfitting. The cross-validation with K learning algorithms is applied to the selected data, in order to obtain the meta-data and the K base classifiers for the next layer. In this way, one layer will output the meta-data as the input data for the next layer, the base classifiers, and the indices of the selected meta-data. A combining algorithm is then applied on the meta-data of the last layer to obtain the final class prediction. Experiments on 30 datasets confirm that the proposed DHE is better than a number of well-known benchmark algorithms
- …