1,260 research outputs found

    Elephant Search with Deep Learning for Microarray Data Analysis

    Full text link
    Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl

    Memetic micro-genetic algorithms for cancer data classification

    Get PDF
    Fast and precise medical diagnosis of human cancer is crucial for treatment decisions. Gene selection consists of identifying a set of informative genes from microarray data to allow high predictive accuracy in human cancer classification. This task is a combinatorial search problem, and optimisation methods can be applied for its resolution. In this paper, two memetic micro-genetic algorithms (MμV1 and MμV2) with different hybridisation approaches are proposed for feature selection of cancer microarray data. Seven gene expression datasets are used for experimentation. The comparison with stochastic state-of-the-art optimisation techniques concludes that problem-dependent local search methods combined with micro-genetic algorithms improve feature selection of cancer microarray data.Fil: Rojas, Matias Gabriel. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Olivera, Ana Carolina. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Vidal, Pablo Javier. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentin

    PLS dimension reduction for classification of microarray data

    Get PDF
    PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets

    Momentum Backpropagation Optimization for Cancer Detection Based on DNA Microarray Data

    Get PDF
    Early detection of cancer can increase the success of treatment in patients with cancer. In the latest research, cancer can be detected through DNA Microarrays. Someone who suffers from cancer will experience changes in the value of certain gene expression.  In previous studies, the Genetic Algorithm as a feature selection method and the Momentum Backpropagation algorithm as a classification method provide a fairly high classification performance, but the Momentum Backpropagation algorithm still has a low convergence rate because the learning rate used is still static. The low convergence rate makes the training process need more time to converge. Therefore, in this research an optimization of the Momentum Backpropagation algorithm is done by adding an adaptive learning rate scheme. The proposed scheme is proven to reduce the number of epochs needed in the training process from 390 epochs to 76 epochs compared to the Momentum Backpropagation algorithm. The proposed scheme can gain high accuracy of 90.51% for Colon Tumor data, and 100% for Leukemia, Lung Cancer, and Ovarian Cancer data

    En-PaFlower: An Ensemble Approach using PSO and Flower Pollination Algorithm for Cancer Diagnosis

    Get PDF
    Machine learning now is used across many sectors and provides consistently precise predictions. The machine learning system is able to learn effectively because the training dataset contains examples of previously completed tasks. After learning how to process the necessary data, researchers have proven that machine learning algorithms can carry out the whole work autonomously. In recent years, cancer has become a major cause of the worldwide increase in mortality. Therefore, early detection of cancer improves the chance of a complete recovery, and Machine Learning (ML) plays a significant role in this perspective. Cancer diagnostic and prognosis microarray dataset is available with the biopsy dataset. Because of its importance in making diagnoses and classifying cancer diseases, the microarray data represents a massive amount. It may be challenging to do an analysis on a large number of datasets, though. As a result, feature selection is crucial, and machine learning provides classification techniques. These algorithms choose the relevant features that help build a more precise categorization model. Accurately classifying diseases is facilitated as a result, which aids in disease prevention. This work aims to synthesize existing knowledge on cancer diagnosis using machine learning techniques into a compact report.  Current research work aims to propose an ensemble-based machine learning model En-PaFlower using Particle Swarm Optimization (PSO) as the feature selection algorithm, Flower Pollination algorithm (FPA) as the optimization algorithm with the majority voting algorithm. Finally, the performance of the proposed algorithm is evaluated over three different types of cancer disease datasets with accuracy, precision, recall, specificity, and F-1 Score etc as the evaluation parameters. The empirical analysis shows that the proposed methodology shows highest accuracy as 95.65%

    Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data

    Get PDF
    Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray expression data we provide a systematic comparison of the PLS approaches currently employed, and discuss problems as different as tumor classification, identification of relevant genes, survival analysis and modeling of gene networks

    Comparison of feature selection and classification for MALDI-MS data

    Get PDF
    INTRODUCTION: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. RESULTS: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. CONCLUSION: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study

    Weighted Heuristic Ensemble of Filters

    Get PDF
    Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to ‘good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability
    corecore