844 research outputs found

    A Hybrid Metaheuristics based technique for Mutation Based Disease Classification

    Get PDF
    Due to recent advancements in computational biology, DNA microarray technology has evolved as a useful tool in the detection of mutation among various complex diseases like cancer. The availability of thousands of microarray datasets makes this field an active area of research. Early cancer detection can reduce the mortality rate and the treatment cost. Cancer classification is a process to provide a detailed overview of the disease microenvironment for better diagnosis. However, the gene microarray datasets suffer from a curse of dimensionality problems also the classification models are prone to be overfitted due to small sample size and large feature space. To address these issues, the authors have proposed an Improved Binary Competitive Swarm Optimization Whale Optimization Algorithm (IBCSOWOA) for cancer classification, in which IBCSO has been employed to reduce the informative gene subset originated from using minimum redundancy maximum relevance (mRMR) as filter method. The IBCSOWOA technique has been tested on an artificial neural network (ANN) model and the whale optimization algorithm (WOA) is used for parameter tuning of the model. The performance of the proposed IBCSOWOA is tested on six different mutation-based microarray datasets and compared with existing disease prediction methods. The experimental results indicate the superiority of the proposed technique over the existing nature-inspired methods in terms of optimal feature subset, classification accuracy, and convergence rate. The proposed technique has illustrated above 98% accuracy in all six datasets with the highest accuracy of 99.45% in the Lung cancer dataset

    Filter � GA Based Approach to Feature Selection for Classification

    Get PDF
    This paper presents a new approach to select reduced number of features in databases. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as and can confuse the process of classification. The proposed method applies filter attribute measure and binary coded Genetic Algorithm to select a small subset of features. The importance of these features is judged by applying K-nearest neighbor (KNN) method of classification. The best reduced subset of features which has high classification accuracy on given databases is adopted. The classification accuracy obtained by proposed method is compared with that reported recently in publications on twenty eight databases. It is noted that proposed method performs satisfactory on these databases and achieves higher classification accuracy but with smaller number of features

    Gene selection for cancer classification with the help of bees

    Full text link

    A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications

    Get PDF
    Particle swarm optimization (PSO) is a heuristic global optimization method, proposed originally by Kennedy and Eberhart in 1995. It is now one of the most commonly used optimization techniques. This survey presented a comprehensive investigation of PSO. On one hand, we provided advances with PSO, including its modifications (including quantum-behaved PSO, bare-bones PSO, chaotic PSO, and fuzzy PSO), population topology (as fully connected, von Neumann, ring, star, random, etc.), hybridization (with genetic algorithm, simulated annealing, Tabu search, artificial immune system, ant colony algorithm, artificial bee colony, differential evolution, harmonic search, and biogeography-based optimization), extensions (to multiobjective, constrained, discrete, and binary optimization), theoretical analysis (parameter selection and tuning, and convergence analysis), and parallel implementation (in multicore, multiprocessor, GPU, and cloud computing forms). On the other hand, we offered a survey on applications of PSO to the following eight fields: electrical and electronic engineering, automation control systems, communication theory, operations research, mechanical engineering, fuel and energy, medicine, chemistry, and biology. It is hoped that this survey would be beneficial for the researchers studying PSO algorithms

    Binary Multi-Verse Optimization (BMVO) Approaches for Feature Selection

    Get PDF
    Multi-Verse Optimization (MVO) is one of the newest meta-heuristic optimization algorithms which imitates the theory of Multi-Verse in Physics and resembles the interaction among the various universes. In problem domains like feature selection, the solutions are often constrained to the binary values viz. 0 and 1. With regard to this, in this paper, binary versions of MVO algorithm have been proposed with two prime aims: firstly, to remove redundant and irrelevant features from the dataset and secondly, to achieve better classification accuracy. The proposed binary versions use the concept of transformation functions for the mapping of a continuous version of the MVO algorithm to its binary versions. For carrying out the experiments, 21 diverse datasets have been used to compare the Binary MVO (BMVO) with some binary versions of existing metaheuristic algorithms. It has been observed that the proposed BMVO approaches have outperformed in terms of a number of features selected and the accuracy of the classification process

    Identification of tissue‐specific tumor biomarker using different optimization algorithms

    Get PDF
    Background Identification of differentially expressed genes, i.e., genes whose transcript abundance level differs across different biological or physiological conditions, was indeed a challenging task. However, the inception of transcriptome sequencing (RNA-seq) technology revolutionized the simultaneous measurement of the transcript abundance levels for thousands of genes. Objective In this paper, such next-generation sequencing (NGS) data is used to identify biomarker signatures for several of the most common cancer types (bladder, colon, kidney, brain, liver, lung, prostate, skin, and thyroid) Methods Here, the problem is mapped into the comparison of optimization algorithms for selecting a set of genes that lead to the highest classification accuracy of a two-class classification task between healthy and tumor samples. As the opti- mization algorithms Artificial Bee Colony (ABC), Ant Colony Optimization, Differential Evolution, and Particle Swarm Optimization are chosen for this experiment. A standard statistical method called DESeq2 is used to select differentially expressed genes before being feed to the optimization algorithms. Classification of healthy and tumor samples is done by support vector machine Results Cancer-specific validation yields remarkably good results in terms of accuracy. Highest classification accuracy is achieved by the ABC algorithm for Brain lower grade glioma data is 99.10%. This validation is well supported by a statisti- cal test, gene ontology enrichment analysis, and KEGG pathway enrichment analysis for each cancer biomarker signature Conclusion The current study identified robust genes as biomarker signatures and these identified biomarkers might be helpful to accurately identify tumors of unknown origi

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P
    corecore