103,446 research outputs found
EFSIS: Ensemble Feature Selection Integrating Stability
Ensemble learning that can be used to combine the predictions from multiple
learners has been widely applied in pattern recognition, and has been reported
to be more robust and accurate than the individual learners. This ensemble
logic has recently also been more applied in feature selection. There are
basically two strategies for ensemble feature selection, namely data
perturbation and function perturbation. Data perturbation performs feature
selection on data subsets sampled from the original dataset and then selects
the features consistently ranked highly across those data subsets. This has
been found to improve both the stability of the selector and the prediction
accuracy for a classifier. Function perturbation frees the user from having to
decide on the most appropriate selector for any given situation and works by
aggregating multiple selectors. This has been found to maintain or improve
classification performance. Here we propose a framework, EFSIS, combining these
two strategies. Empirical results indicate that EFSIS gives both high
prediction accuracy and stability.Comment: 20 pages, 3 figure
A New Terrain Classification Framework Using Proprioceptive Sensors for Mobile Robots
Mobile robots that operate in real-world environments interact with the surroundings to generate complex acoustics and vibration signals, which carry rich information about the terrain. This paper presents a new terrain classification framework that utilizes both acoustics and vibration signals resulting from the robot-terrain interaction. As an alternative to handcrafted domain-specific feature extraction, a two-stage feature selection method combining ReliefF and mRMR algorithms was developed to select optimal feature subsets that carry more discriminative information. As different data sources can provide complementary information, a multiclassifier combination method was proposed by considering a priori knowledge and fusing predictions from five data sources: one acoustic data source and four vibration data sources. In this study, four conceptually different classifiers were employed to perform the classification, each with a different number of optimal features. Signals were collected using a tracked robot moving at three different speeds on six different terrains. The new framework successfully improved classification performance of different classifiers using the newly developed optimal feature subsets. The greater improvement was observed for robot traversing at lower speeds
Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data
Ensemble classification is a well-established approach that involves fusing the decisions of multiple predictive models. A similar “ensemble logic” has been recently applied to challenging feature selection tasks aimed at identifying the most informative variables (or features) for a given domain of interest. In this work, we discuss the rationale of ensemble feature selection and evaluate the effects and the implications of a specific ensemble approach, namely the data perturbation strategy. Basically, it consists in combining multiple selectors that exploit the same core algorithm but are trained on different perturbed versions of the original data. The real potential of this approach, still object of debate in the feature selection literature, is here investigated in conjunction with different kinds of core selection algorithms (both univariate and multivariate). In particular, we evaluate the extent to which the ensemble implementation improves the overall performance of the selection process, in terms of predictive accuracy and stability (i.e., robustness with respect to changes in the training data). Furthermore, we measure the impact of the ensemble approach on the final selection outcome, i.e. on the composition of the selected feature subsets. The results obtained on ten public genomic benchmarks provide useful insight on both the benefits and the limitations of such ensemble approach, paving the way to the exploration of new and wider ensemble schemes
A Novel Hybrid Feature Selection Algorithm for Hierarchical Classification
Feature selection is a widespread preprocessing step in the data mining field. One of its purposes is to reduce the number of original dataset features to improve a predictive model’s performance. Despite the benefits of feature selection for the classification task, to the best of our knowledge, few studies in the literature address feature selection for the hierarchical classification context. This paper proposes a novel feature selection method based on the general variable neighborhood search metaheuristic, combining a filter and a wrapper step, wherein a global model hierarchical classifier evaluates feature subsets. We used twelve datasets from the proteins and images domains to perform computational experiments to validate the effect of the proposed algorithm on classification performance when using two global hierarchical classifiers proposed in the literature. Statistical tests showed that using our method for feature selection led to predictive performances that were consistently better than or equivalent to that obtained by using all features with the benefit of reducing the number of features needed, which justifies its efficiency for the hierarchical classification scenario
Recommended from our members
Feature Selection for Computer-Aided Polyp Detection using MRMR
In building robust classifiers for computer-aided detection (CAD) of lesions, selection of relevant features is of fundamental importance. Typically one is interested in determining which, of a large number of potentially redundant or noisy features, are most discriminative for classification. Searching all possible subsets of features is impractical computationally. This paper proposes a feature selection scheme combining AdaBoost with the Minimum Redundancy Maximum Relevance (MRMR) to focus on the most discriminative features. A fitness function is designed to determine the optimal number of features in a forward wrapper search. Bagging is applied to reduce the variance of the classifier and make a reliable selection. Experiments demonstrate that by selecting just 11 percent of the total features, the classifier can achieve better prediction on independent test data compared to the 70 percent of the total features selected by AdaBoost
Contribution to supervised representation learning: algorithms and applications.
278 p.In this thesis, we focus on supervised learning methods for pattern categorization. In this context, itremains a major challenge to establish efficient relationships between the discriminant properties of theextracted features and the inter-class sparsity structure.Our first attempt to address this problem was to develop a method called "Robust Discriminant Analysiswith Feature Selection and Inter-class Sparsity" (RDA_FSIS). This method performs feature selectionand extraction simultaneously. The targeted projection transformation focuses on the most discriminativeoriginal features while guaranteeing that the extracted (or transformed) features belonging to the sameclass share a common sparse structure, which contributes to small intra-class distances.In a further study on this approach, some improvements have been introduced in terms of theoptimization criterion and the applied optimization process. In fact, we proposed an improved version ofthe original RDA_FSIS called "Enhanced Discriminant Analysis with Class Sparsity using GradientMethod" (EDA_CS). The basic improvement is twofold: on the first hand, in the alternatingoptimization, we update the linear transformation and tune it with the gradient descent method, resultingin a more efficient and less complex solution than the closed form adopted in RDA_FSIS.On the other hand, the method could be used as a fine-tuning technique for many feature extractionmethods. The main feature of this approach lies in the fact that it is a gradient descent based refinementapplied to a closed form solution. This makes it suitable for combining several extraction methods andcan thus improve the performance of the classification process.In accordance with the above methods, we proposed a hybrid linear feature extraction scheme called"feature extraction using gradient descent with hybrid initialization" (FE_GD_HI). This method, basedon a unified criterion, was able to take advantage of several powerful linear discriminant methods. Thelinear transformation is computed using a descent gradient method. The strength of this approach is thatit is generic in the sense that it allows fine tuning of the hybrid solution provided by different methods.Finally, we proposed a new efficient ensemble learning approach that aims to estimate an improved datarepresentation. The proposed method is called "ICS Based Ensemble Learning for Image Classification"(EM_ICS). Instead of using multiple classifiers on the transformed features, we aim to estimate multipleextracted feature subsets. These were obtained by multiple learned linear embeddings. Multiple featuresubsets were used to estimate the transformations, which were ranked using multiple feature selectiontechniques. The derived extracted feature subsets were concatenated into a single data representationvector with strong discriminative properties.Experiments conducted on various benchmark datasets ranging from face images, handwritten digitimages, object images to text datasets showed promising results that outperformed the existing state-ofthe-art and competing methods
Recommended from our members
Privileged Learning using Unselected Features
This thesis proposes a novel machine learning paradigm called Learning using Unselected Features (LUFe), which front-loads computation to training time in order to improve classifier performance, without additional cost at deployment. This is achieved by repurposing and combining techniques from feature selection and Learning Using Privileged Information (LUPI). Feature selection is a means of reducing model complexity, which enables deployment in devices with limited computational power, but this can waste additional resources which may be available at training time. LUPI is a paradigm that allows extra information about the training data to be harnessed by the learner, but this requires an additional set of highly informative attributes. In the LUFe setting, feature selection is used to partition datasets into primary and secondary subsets, instead of discarding the features which are unselected. Both datasets are then passed to a LUPI algorithm, enabling the secondary feature-set to provide additional guidance at training time only, in place of `privileged' information. Only the selected features are used at train time, maintaining low-cost deployment while exploiting train-time resources.
Experimental results on a large number of datasets demonstrate that LUFe facilitates an improvement in classification accuracy over standard feature selection approaches in a majority of cases. This performance boost is consistent across a range of feature selection approaches, and is largest when the SVM+ algorithm is used for implementation. This effect is shown to be partially dependent on the usage of information in the unselected features, as well as resulting from the presence of additional constraints on the function space searched for the model. The enhancement by LUFe is shown to be inversely correlated with the performance of standard feature selection and mediated by a further reduction in model variance, beyond that provided by standard feature selection. Aside from demonstrating the direct practical benefit of LUFe, this work makes the contribution of broadening the scope of applications for the LUPI framework
Investigating ensemble methods for essential gene predictions in bacteria
Essential genes are the genes required for an organism to survive in stable conditions with an abundance of nutrients. The identification of essential genes is important to both our understanding of bacterial organisms and our ability to manipulate them. Many machine learning methods have been proposed for the prediction of essential genes. However, the majority of these studies have a limited focus, i.e. a single optimised classifier and feature set combination to predict genes within the same organism. Therefore, as the models have a narrow scope they cannot be reliably applied to newly sequenced organisms. This ability of a model to generalise to new data can be improved by increasing the dataset and combining results from different classifiers.
The aim of this thesis was to develop an ensemble method to predict essential genes in bacteria. In total 62 commonly used sequence based features and 7 supervised learning classifiers were identified from the literature. Using online databases, 73 studies with high quality laboratory essentiality data were collated for 45 bacterial strains. To build the ensemble base learners, feature selection algorithms were used to generate feature subsets. Analysis of the subsets showed that while particular features were selected more frequently by the algorithms, no features were completely excluded. The performance of each subset with the classifiers was investigated to identify feature sets for the ensemble base learners.
Through studying the performance of the feature sets as part of a majority voting ensemble algorithm, we were able to show that for cross validation the ensemble approach performance was higher than the individual classifiers. This was confirmed through validation testing on organism with no matching genus in training data.
The results show that it is possible to improve the ability of a classifier to generalise to new organisms through the application of feature selection and ensemble learning
Neuropsychological predictors of conversion from mild cognitive impairment to Alzheimer’s disease: a feature selection ensemble combining stability and predictability
Background
Predicting progression from Mild Cognitive Impairment (MCI) to Alzheimer’s Disease (AD) is an utmost open issue in AD-related research. Neuropsychological assessment has proven to be useful in identifying MCI patients who are likely to convert to dementia. However, the large battery of neuropsychological tests (NPTs) performed in clinical practice and the limited number of training examples are challenge to machine learning when learning prognostic models. In this context, it is paramount to pursue approaches that effectively seek for reduced sets of relevant features. Subsets of NPTs from which prognostic models can be learnt should not only be good predictors, but also stable, promoting generalizable and explainable models.
Methods
We propose a feature selection (FS) ensemble combining stability and predictability to choose the most relevant NPTs for prognostic prediction in AD. First, we combine the outcome of multiple (filter and embedded) FS methods. Then, we use a wrapper-based approach optimizing both stability and predictability to compute the number of selected features. We use two large prospective studies (ADNI and the Portuguese Cognitive Complaints Cohort, CCC) to evaluate the approach and assess the predictive value of a large number of NPTs. Results
The best subsets of features include approximately 30 and 20 (from the original 79 and 40) features, for ADNI and CCC data, respectively, yielding stability above 0.89 and 0.95, and AUC above 0.87 and 0.82. Most NPTs learnt using the proposed feature selection ensemble have been identified in the literature as strong predictors of conversion from MCI to AD.
Conclusions
The FS ensemble approach was able to 1) identify subsets of stable and relevant predictors from a consensus of multiple FS methods using baseline NPTs and 2) learn reliable prognostic models of conversion from MCI to AD using these subsets of features. The machine learning models learnt from these features outperformed the models trained without FS and achieved competitive results when compared to commonly used FS algorithms. Furthermore, the selected features are derived from a consensus of methods thus being more robust, while releasing users from choosing the most appropriate FS method to be used in their classification task.PTDC/EEI-SII/1937/2014; SFRH/BD/95846/2013; SFRH/BD/118872/2016info:eu-repo/semantics/publishedVersio
- …