434 research outputs found
Hybrid bat-ant colony optimization algorithm for rule-based feature selection in health care
Rule-based classification in the field of health care using artificial intelligence provides solutions in decision-making problems involving different domains. An important challenge is providing access to good and fast health facilities. Cervical cancer is one of the most frequent causes of death in females. The diagnostic methods for cervical cancer used in health centers are costly and time-consuming. In this paper, bat algorithm for feature selection and ant colony optimization-based classification algorithm were applied on cervical cancer data set obtained from the repository of the University of California, Irvine to analyze the disease based on optimal features. The proposed algorithm outperforms other methods in terms of comprehensibility and obtains better results in terms of classification accuracy
Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation
Feature selection (FS) has become an indispensable task in dealing with
today's highly complex pattern recognition problems with massive number of
features. In this study, we propose a new wrapper approach for FS based on
binary simultaneous perturbation stochastic approximation (BSPSA). This
pseudo-gradient descent stochastic algorithm starts with an initial feature
vector and moves toward the optimal feature vector via successive iterations.
In each iteration, the current feature vector's individual components are
perturbed simultaneously by random offsets from a qualified probability
distribution. We present computational experiments on datasets with numbers of
features ranging from a few dozens to thousands using three widely-used
classifiers as wrappers: nearest neighbor, decision tree, and linear support
vector machine. We compare our methodology against the full set of features as
well as a binary genetic algorithm and sequential FS methods using
cross-validated classification error rate and AUC as the performance criteria.
Our results indicate that features selected by BSPSA compare favorably to
alternative methods in general and BSPSA can yield superior feature sets for
datasets with tens of thousands of features by examining an extremely small
fraction of the solution space. We are not aware of any other wrapper FS
methods that are computationally feasible with good convergence properties for
such large datasets.Comment: This is the Istanbul Sehir University Technical Report
#SHR-ISE-2016.01. A short version of this report has been accepted for
publication at Pattern Recognition Letter
Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives
The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions
- …