Search CORE

748 research outputs found

A hybrid LDA and genetic algorithm for gene selection and classification of microarray data

Author: B. Duval
E.B. Huerta
J.K. Hao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

In supervised classification of Microarray data, gene selection aims at identifying a (small) subset of informative genes from the initial data in order to obtain high predictive accuracy. This paper introduces a new embedded approach to this difficult task where a genetic algorithm (GA) is combined with Fisher\u27s linear discriminant analysis (LDA). This LDA-based GA algorithm has the major characteristic that the GA uses not only a LDA classifier in its fitness function, but also LDA\u27s discriminant coefficients in its dedicated crossover and mutation operators. Computational experiments on seven public datasets show that under an unbiased experimental protocol, the proposed algorithm is able to reach high prediction accuracies with a small number of selected genes

Crossref

Okina

Hal-Diderot

Effect of Feature Selection on Gene Expression Datasets Classification Accurac

Author: Lazaar Mohamed
Omara Hicham
Tabii Youness
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/10/2018
Field of study

Feature selection attracts researchers who deal with machine learning and data mining. It consists of selecting the variables that have the greatest impact on the dataset classification, and discarding the rest. This dimentionality reduction allows classifiers to be fast and more accurate. This paper traits the effect of feature selection on the accuracy of widely used classifiers in literature. These classifiers are compared with three real datasets which are pre-processed with feature selection methods. More than 9% amelioration in classification accuracy is observed, and k-means appears to be the most sensitive classifier to feature selection

IAES journal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Identification of an Efficient Gene Expression Panel for Glioblastoma Classification.

Author: Coppola Giovanni
Crisman Thomas J
Gao Fuying
Kawaguchi Riki
Kornblum Harley I
Laks Dan R
Zelaya Ivette
Zhao Yining
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

Author: Chen Yun-Mei
Tian Li
Wu Wen-Ming
Xu Shuang
Yang Li-Jun
Yang Xiao-Hui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is firstly proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table

arXiv.org e-Print Archive

Crossref

Effective Prostate Cancer Detection using Enhanced Particle Swarm Optimization Algorithm with Random Forest on the Microarray Data

Author: Hulipalled Vishwanath
Metipatil Prabhuraj
Prakashrao Kaulgud Sanjeev
Somanagouda Patil Siddanagouda
Publication venue: Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek
Publication date: 01/01/2023
Field of study

Prostate Cancer (PC) is the leading cause of mortality among males, therefore an effective system is required for identifying the sensitive bio-markers for early recognition. The objective of the research is to find the potential bio-markers for characterizing the dissimilar types of PC. In this article, the PC-related genes are acquired from the Gene Expression Omnibus (GEO) database. Then, gene selection is accomplished using enhanced Particle Swarm Optimization (PSO) to select the active genes, which are related to the PC. In the enhanced PSO algorithm, the interval-newton approach is included to keep the search space adaptive by varying the swarm diversity that helps to perform the local search significantly. The selected active genes are fed to the random forest classifier for the classification of PC (high and low-risk). As seen in the experimental investigation, the proposed model achieved an overall classification accuracy of 96.71%, which is better compared to the traditional models like naïve Bayes, support vector machine and neural network

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Molecular Signature as Optima of Multi-Objective Function with Applications to Prediction in Oncogenomics

Author: Aligerová Zuzana
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2015
Field of study

Náplní této práce je teoretický úvod a následné praktické zpracování tématu Molekulární signatura jako optimální multi-objektivní funkce s aplikací v predikci v onkogenomice. Úvodní kapitoly jsou zaměřeny na téma rakovina, zejména pak rakovina prsu a její podtyp triple negativní rakovinu prsu. Následuje literární přehled z oblasti optimalizačních metod, zejména se zaměřením na metaheuristické metody a problematiku strojového učení. Část se odkazuje na onkogenomiku a principy microarray a také na statistiku a s důrazem na výpočet p-hodnoty a bimodálního indexu. Praktická část je pak zaměřena na konkrétní průběh výzkumu a nalezené závěry, vedoucí k dalším krokům výzkumu. Implementace vybraných metod byla provedena v programech Matlab a R, s využitím dalších programovacích jazyků a to konkrétně programů Java a Python.Content of this work is theoretical introduction and follow-up practical processing of topic Molecular signature as optima of multi-objective function with applications to prediction in oncogenomics. Opening chapters are targeted on topic of cancer, mainly on breast cancer and its subtype Triple Negative Breast Cancer. Succeeds the literature review of optimization methods, mainly on meta-heuristic methods for multi-objective optimization and problematic of machine learning. Part is focused on the oncogenomics and on the principal of microarray and also to statistics methods with emphasis on the calculation of p-value and Bimodality Index. Practical part of work consists from concrete research and conclusions lead to next steps of research. Implementation of selected methods was realised in Matlab and R, with use of other programming languages Java and Python.

Digital library of Brno University of Technology

National Repository of Grey Literature

Recommended from our members

Evolutionary computation-based feature selection for finding a stable set of features in high-dimensional data

Author: Salesi Mousaabadi S
Publication venue
Publication date: 01/09/2019
Field of study

Evolutionary Computation (EC) algorithms have proved to work well for feature selection because they are powerful search techniques and can produce multiple good solutions. However, they suﬀer from some limitations for real world applications. Firstly, ECs require high computation time as they evaluate many solutions at each iteration. Secondly, a classiﬁer is usually used as their ﬁtness function which causes the selected subset to perform well only on the utilised classiﬁer (e.g. classiﬁer-bias). Lastly, ECs, as stochastic search methods, return a diﬀerent ﬁnal subset in diﬀerent runs which poses a problem for ﬁnding a stable set of features (e.g. stability issue). To address computation time and classiﬁer-bias limitations, this thesis proposes a new two-stage selection approach called ﬁlter/ﬁlter in which two ﬁlter feature selection algorithms are combined. In the ﬁrst stage, a ranking algorithm forms a reduced dataset by selecting the most informative features from the original dataset. In the second stage, the reduced dataset is fed to a novel EC algorithm to select ﬁnal feature subset. This new EC algorithm is a Tabu search hybridised with an Asexual Genetic Algorithm called TAGA. TAGA beneﬁts from new search components and solution representation which can eﬀectively reduce computation time. To select a classiﬁer-unbiased ﬁnal subset, a statistical criterion is used as the ﬁtness function which evaluates the subset independent of any classiﬁer. Experiments show that the proposed ﬁlter/ﬁlter requires an acceptable computation time and selects more classiﬁer-unbiased features compared to the state-of-the-arts. To ﬁnd a stable set of features, a novel Generalisation Power Index (GPI) is proposed to analyse the generalisation power of ﬁnal subsets of an EC in several runs. Generalisation power refers to performance capability of a subset over wide range of classiﬁers. Computation results conﬁrm that GPI is able to ﬁnd a stable set of features which achieves near optimal accuracy when used to train various classiﬁers. To ex amine the suitability of the proposed methods for real-world applications, the ﬁlter/ﬁlter approach and GPI are integrated to select a stable set of features for METABRIC breast cancer subtype classiﬁcation problem. Experimental results show that this integration not only can address the limitations of ECs for a real-world biomedical feature selection problem but it performs better than alternatives methods

Nottingham Trent Institutional Repository (IRep)

PMP-SVM: A Hybrid Approach for effective Cancer Diagnosis using Feature Selection and Optimization

Author: Pinakshi Panda et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 07/11/2023
Field of study

Cancer disease is becoming a prominent factor in increasing the death ration over the world due to the late diagnosis. Machine Learning (ML) is playing a vital role in providing computer aided diagnosis models for early diagnosis of cancer. For the diagnosis process the microarray data has its own place. Microarray data contain the genetic information of a patient with a large number of dimensions such as genes with a small sample such as patient details. If the microarray is directly taken without reducing the dimension as the input to any ML model for classification, then Small Sample Size is the resulting issue. So, size of the microarray data needs to be reduces by using either of dimensionality reduction technique or the feature selection technique to increase the model’s performance. In this work, proposed a hybrid model using Principal Component Analysis (PCA), Maximum Relevance Minimum Redundancy (MRMR), Particle Swarm Optimization (PSO) and  Support Vector Machine (SVM) for cancer diagnosis. PCA and MRMR is used for feature selection and PSO is applied to get the optimized feature set. Finally, SVM is applied as the classification model. The proposed model is evaluated against multiple cancer microarray datasets to measure the performance in terms of accuracy, precision, recall, and F1 score. Result shows that proposed model performs better than existing state of art model

International Journal on Recent and Innovation Trends in Computing and Communication

Protein fold recognition using genetic algorithm optimized voting scheme and profile bigram

Author: Dehzangi Abdollah
Imoto S.
Lal Sunil P.
Raicar Gaurav
Saini Harsh
Sharma Alokanand
Publication venue: JSW
Publication date: 01/01/2016
Field of study

In biology, identifying the tertiary structure of a protein helps determine its functions. A step towards tertiary structure identification is predicting a protein’s fold. Computational methods have been applied to determine a protein’s fold by assembling information from its structural, physicochemical and/or evolutionary properties. It has been shown that evolutionary information helps improve prediction accuracy. In this study, a scheme is proposed that uses the genetic algorithm (GA) to optimize a weighted voting scheme to improve protein fold recognition. This scheme incorporates k-separated bigram transition probabilities for feature extraction, which are based on the Position Specific Scoring Matrix (PSSM). A set of SVM classifiers are used for initial classification, whereupon their predictions are consolidated using the optimized weighted voting scheme. This scheme has been demonstrated on the Ding and Dubchak (DD), Extended Ding and Dubchak (EDD) and Taguchi and Gromhia (TG) datasets benchmarked data sets

University of the South Pacific Electronic Research Repository