25,727 research outputs found
Particle Swarm Optimisation for Feature Selection in Classification
Classification problems often have a large number of features, but not all of them are useful for classification. Irrelevant and redundant features may even reduce the classification accuracy. Feature selection is a process of selecting a subset of relevant features, which can decrease the dimensionality, shorten the running time, and/or improve the classification accuracy. There are two types of feature selection approaches, i.e. wrapper and filter approaches. Their main difference is that wrappers use a classification algorithm to evaluate the goodness of the features during the feature selection process while filters are independent of any classification algorithm. Feature selection is a difficult task because of feature interactions and the large search space. Existing feature selection methods suffer from different problems, such as stagnation in local optima and high computational cost. Evolutionary computation (EC) techniques are well-known global search algorithms. Particle swarm optimisation (PSO) is an EC technique that is computationally less expensive and can converge faster than other methods. PSO has been successfully applied to many areas, but its potential for feature selection has not been fully investigated.
The overall goal of this thesis is to investigate and improve the capability of PSO for feature selection to select a smaller number of features and achieve similar or better classification performance than using all features.
This thesis investigates the use of PSO for both wrapper and filter, and for both single objective and multi-objective feature selection, and also investigates the differences between wrappers and filters.
This thesis proposes a new PSO based wrapper, single objective feature selection approach by developing new initialisation and updating mechanisms. The results show that by considering the number of features in the initialisation and updating procedures, the new algorithm can improve the classification performance, reduce the number of features and decrease computational time.
This thesis develops the first PSO based wrapper multi-objective feature selection approach, which aims to maximise the classification accuracy and simultaneously minimise the number of features. The results show that the proposed multi-objective algorithm can obtain more and better feature subsets than single objective algorithms, and outperform other well-known EC based multi-objective feature selection algorithms.
This thesis develops a filter, single objective feature selection approach based on PSO and information theory. Two measures are proposed to evaluate the relevance of the selected features based on each pair of features and a group of features, respectively. The results show that PSO and information based algorithms can successfully address feature selection tasks. The group based method achieves higher classification accuracies, but the pair based method is faster and selects smaller feature subsets.
This thesis proposes the first PSO based multi-objective filter feature selection approach using information based measures. This work is also the first work using other two well-known multi-objective EC algorithms in filter feature selection, which are also used to compare the performance of the PSO based approach. The results show that the PSO based multiobjective filter approach can successfully address feature selection problems, outperform single objective filter algorithms and achieve better classification performance than other multi-objective algorithms.
This thesis investigates the difference between wrapper and filter approaches in terms of the classification performance and computational time, and also examines the generality of wrappers. The results show that wrappers generally achieve better or similar classification performance than filters, but do not always need longer computational time than filters. The results also show that wrappers built with simple classification algorithms can be general to other classification algorithms
A lexicographic multi-objective genetic algorithm for multi-label correlation-based feature selection
This paper proposes a new Lexicographic multi-objective Genetic Algorithm for Multi-Label Correlation-based Feature Selection (LexGA-ML-CFS), which is an extension of the previous single-objective Genetic Algorithm for Multi-label Correlation-based Feature Selection (GA-ML-CFS). This extension uses a LexGA as a global search method for generating candidate feature subsets. In our experiments, we compare the results obtained by LexGA-ML-CFS with the results obtained by the original hill climbing-based ML-CFS, the single-objective GA-ML-CFS and a baseline Binary Relevance method, using ML-kNN as the multi-label classifier. The results from our experiments show that LexGA-ML-CFS improved predictive accuracy, by comparison with other methods, in some cases, but in general there was no statistically significant different between the results of LexGA-ML-CFS and other methods
Recommended from our members
Prediction of progression in idiopathic pulmonary fibrosis using CT scans atbaseline: A quantum particle swarm optimization - Random forest approach
Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease characterized by an unpredictable progressive declinein lung function. Natural history of IPF is unknown and the prediction of disease progression at the time ofdiagnosis is notoriously difficult. High resolution computed tomography (HRCT) has been used for the diagnosisof IPF, but not generally for monitoring purpose. The objective of this work is to develop a novel predictivemodel for the radiological progression pattern at voxel-wise level using only baseline HRCT scans. Mainly, thereare two challenges: (a) obtaining a data set of features for region of interest (ROI) on baseline HRCT scans andtheir follow-up status; and (b) simultaneously selecting important features from high-dimensional space, andoptimizing the prediction performance. We resolved the first challenge by implementing a study design andhaving an expert radiologist contour ROIs at baseline scans, depending on its progression status in follow-upvisits. For the second challenge, we integrated the feature selection with prediction by developing an algorithmusing a wrapper method that combines quantum particle swarm optimization to select a small number of featureswith random forest to classify early patterns of progression. We applied our proposed algorithm to analyzeanonymized HRCT images from 50 IPF subjects from a multi-center clinical trial. We showed that it yields aparsimonious model with 81.8% sensitivity, 82.2% specificity and an overall accuracy rate of 82.1% at the ROIlevel. These results are superior to other popular feature selections and classification methods, in that ourmethod produces higher accuracy in prediction of progression and more balanced sensitivity and specificity witha smaller number of selected features. Our work is the first approach to show that it is possible to use onlybaseline HRCT scans to predict progressive ROIs at 6 months to 1year follow-ups using artificial intelligence
- …