503 research outputs found
Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives
The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions
Hybrid Feature Selection Approach Based on GRASP for Cancer Microarray Data
Microarray data usually contain a large number of genes, but a small number of samples. Feature subset selection for microarray data aims at reducing the number of genes so that useful information can be extracted from the samples. Reducing the dimension of data sets further helps in improving the computational efficiency of the learning model. In this paper, we propose a modified algorithm based on the tabu search as local search procedures to a Greedy Randomized Adaptive Search Procedure (GRASP) for high dimensional microarray data sets. The proposed Tabu based Greedy Randomized Adaptive Search Procedure algorithm is named as TGRASP. In TGRASP, a new parameter has been introduced named as Tabu Tenure and the existing parameters, NumIter and size have been modified. We observed that different parameter settings affect the quality of the optimum. The second proposed algorithm known as FFGRASP (Firefly Greedy Randomized Adaptive Search Procedure) uses a firefly optimization algorithm in the local search optimzation phase of the greedy randomized adaptive search procedure (GRASP). Firefly algorithm is one of the powerful algorithms for optimization of multimodal applications. Experimental results show that the proposed TGRASP and FFGRASP algorithms are much better than existing algorithm with respect to three performance parameters viz. accuracy, run time, number of a selected subset of features. We have also compared both the approaches with a unified metric (Extended Adjusted Ratio of Ratios) which has shown that TGRASP approach outperforms existing approach for six out of nine cancer microarray datasets and FFGRASP performs better on seven out of nine datasets
On the role of metaheuristic optimization in bioinformatics
Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics
Swarm Intelligence Based Feature Selection for High Dimensional Classification: A Literature Survey
Feature selection is an important and challenging task in machine learning and data mining techniques to avoid the curse of dimensionality and maximize the classification accuracy. Moreover, feature selection helps to reduce computational complexity of learning algorithm, improve prediction performance, better data understanding and reduce data storage space. Swarm intelligence based feature selection approach enables to find an optimal feature subset from an extremely large dimensionality of features for building the most accurate classifier model. There is still a type of researches that is not done yet in data mining. In this paper, the utilization of swarm intelligence algorithms for feature selection process in high dimensional data focusing on medical data classification is form the subject matter. The results shows that swarm intelligence algorithms reviewed based on state-of-the-art literature have a promising capability that can be applied in feature selections techniques. The significance of this work is to present the comparison and various alternatives of swarm algorithms to be applied in feature selections for high dimensional classification
Survey analysis for optimization algorithms applied to electroencephalogram
This paper presents a survey for optimization approaches that analyze and classify Electroencephalogram (EEG) signals. The automatic analysis of EEG presents a significant challenge due to the high-dimensional data volume. Optimization algorithms seek to achieve better accuracy by selecting practical features and reducing unwanted features. Forty-seven reputable research papers are provided in this work, emphasizing the developed and executed techniques divided into seven groups based on the applied optimization algorithm particle swarm optimization (PSO), ant colony optimization (ACO), artificial bee colony (ABC), grey wolf optimizer (GWO), Bat, Firefly, and other optimizer approaches). The main measures to analyze this paper are accuracy, precision, recall, and F1-score assessment. Several datasets have been utilized in the included papers like EEG Bonn University, CHB-MIT, electrocardiography (ECG) dataset, and other datasets. The results have proven that the PSO and GWO algorithms have achieved the highest accuracy rate of around 99% compared with other techniques
A Probabilistic Multi-Objective Artificial Bee Colony Algorithm for Gene Selection
Microarray technology is widely used to report gene expression data. The inclusion of many features and few samples is one of the characteristic features of this platform. In order to define significant genes for a particular disease, the problem of high-dimensionality microarray data should be overcome. The Artificial Bee Colony (ABC) Algorithm is a successful meta-heuristic algorithm that solves optimization problems effectively. In this paper, we propose a hybrid gene selection method for discriminatively selecting genes. We propose a new probabilistic binary Artificial Bee Colony Algorithm, namely PrBABC, that is hybridized with three different filter methods. The proposed method is applied to nine microarray datasets in order to detect distinctive genes for classifying cancer data. Results are compared with other wellknown meta-heuristic algorithms: Binary Differential Evolution Algorithm (BinDE), Binary Particle Swarm Optimization Algorithm (BinPSO), and Genetic Algorithm (GA), as well as with other methods in the literature. Experimental results show that the probabilistic self-adaptive learning strategy integrated into the employed-bee phase can boost classification accuracy with a minimal number of genes
Examining Swarm Intelligence-based Feature Selection for Multi-Label Classification
Multi-label classification addresses the issues that more than one class label assigns to each instance. Many real-world multi-label classification tasks are high-dimensional due to digital technologies, leading to reduced performance of traditional multi-label classifiers. Feature selection is a common and successful approach to tackling this problem by retaining relevant features and eliminating redundant ones to reduce dimensionality. There is several feature selection that is successfully applied in multi-label learning. Most of those features are wrapper methods that employ a multi-label classifier in their processes. They run a classifier in each step, which requires a high computational cost, and thus they suffer from scalability issues. To deal with this issue, filter methods are introduced to evaluate the feature subsets using information-theoretic mechanisms instead of running classifiers. This paper aims to provide a comprehensive review of different methods of feature selection presented for the tasks of multi-label classification. To this end, in this review, we have investigated most of the well-known and state-of-the-art methods. We then provided the main characteristics of the existing multi-label feature selection techniques and compared them analytically
Mutable composite firefly algorithm for gene selection in microarray based cancer classification
Cancer classification is critical due to the strenuous effort required in cancer treatment and the rising cancer mortality rate. Recent trends with high throughput technologies have led to discoveries in terms of biomarkers that successfully contributed to cancerrelated issues. A computational approach for gene selection based on microarray data
analysis has been applied in many cancer classification problems. However, the existing hybrid approaches with metaheuristic optimization algorithms in feature selection (specifically in gene selection) are not generalized enough to efficiently classify most cancer microarray data while maintaining a small set of genes. This leads to the classification accuracy and genes subset size problem. Hence, this study proposed to modify the Firefly Algorithm (FA) along with the Correlation-based Feature Selection (CFS) filter for the gene selection task. An improved FA was proposed to overcome FA slow convergence by generating mutable size solutions for the firefly population. In addition, a composite position update strategy was designed for the mutable size solutions. The proposed strategy was to balance FA exploration and exploitation in order to address the local optima problem. The proposed hybrid algorithm known as CFS-Mutable Composite Firefly Algorithm (CFS-MCFA) was evaluated on cancer microarray data for biomarker selection along with the
deployment of Support Vector Machine (SVM) as the classifier. Evaluation was performed based on two metrics: classification accuracy and size of feature set. The results showed that the CFS-MCFA-SVM algorithm outperforms benchmark methods in terms of classification accuracy and genes subset size. In particular, 100 percent accuracy was achieved on all four datasets and with only a few biomarkers (between one and four). This result indicates that the proposed algorithm is one of the competitive alternatives in feature selection, which later contributes to the analysis of microarray data
- …