Search CORE

5,670 research outputs found

Algorithms Implemented for Cancer Gene Searching and Classifications

Author: Al-Rajab Murad
Lu Joan
Publication venue
Publication date
Field of study

Understanding the gene expression is an important factor to cancer diagnosis. One target of this understanding is implementing cancer gene search and classification methods. However, cancer gene search and classification is a challenge in that there is no an obvious exact algorithm that can be implemented individually for various cancer cells. In this paper a research is con-ducted through the most common top ranked algorithms implemented for cancer gene search and classification, and how they are implemented to reach a better performance. The paper will distinguish algorithms implemented for Bio image analysis for cancer cells and algorithms implemented based on DNA array data. The main purpose of this paper is to explore a road map towards presenting the most current algorithms implemented for cancer gene search and classification

University of Huddersfield Repository

Elephant Search with Deep Learning for Microarray Data Analysis

Author: Panda Mrutyunjaya
Publication venue
Publication date: 12/07/2017
Field of study

Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl

arXiv.org e-Print Archive

Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis

Author: Al-Rajab Murad
Lu Joan
Qiang Xu
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Background and Objectives: This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. Methods: In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. Results: It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Sup- port Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). Conclusions: It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society

University of Huddersfield Repository

Huddersfield Research Portal

Supervised learning-based tagSNP selection for genome-wide disease classifications

Author: Chen Zhongxue
Huang Xudong
Liu Qingzhong
Sung Andrew H
Yang Jack
Yang Mary Qu
Publication venue: BioMed Central
Publication date: 25/07/2007
Field of study

The article was originally published by BMC Genomics. doi:10.1186/1471-2164-9-S1-S6Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markersResearch supports received from ICASA (Institute for Complex Additive Systems Analysis, a division of New Mexico Tech) and the Radiology Department of Brigham and Women's Hospital (BWH) are gratefully acknowledged. The authors highly appreciate Dr. Liang at SUNY-Buffalo for her invaluable help and insightful discussion during this study and Ms. Kim Lawson at BWH Radiology Department for her manuscript editing and very constructive comments.Supervised Recursive Feature AdditionsSupport Vector bases Recursive Feature Additioncomplex diseasegeneticsdisease prediction

Crossref

Scholarly Works @ SHSU (Sam Houston State University)

Springer - Publisher Connector

PubMed Central

Recommended from our members

Supervised Learning-Based tagSNP Selection for Genome-Wide Disease Classifications

Author: Chen Zhongxue
Huang Xudong
Liu Qingzhong
Sung Andrew H
Yang Jack
Yang Mary Qu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2011
Field of study

Background: Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. Results: We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. Conclusions: We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions

Harvard University - DASH

Partition Decoupling for Multi-gene Analysis of Gene Expression Profiling Data

Author: Braun Rosemary
Leibon Gregory
Pauls Scott
Rockmore Daniel
Publication venue
Publication date: 01/01/2011
Field of study

We present the extention and application of a new unsupervised statistical learning technique--the Partition Decoupling Method--to gene expression data. Because it has the ability to reveal non-linear and non-convex geometries present in the data, the PDM is an improvement over typical gene expression analysis algorithms, permitting a multi-gene analysis that can reveal phenotypic differences even when the individual genes do not exhibit differential expression. Here, we apply the PDM to publicly-available gene expression data sets, and demonstrate that we are able to identify cell types and treatments with higher accuracy than is obtained through other approaches. By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may be used to find sets of mechanistically-related genes that discriminate phenotypes.Comment: Revise

arXiv.org e-Print Archive

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

The Marker State Space (MSS) Method for Classifying Clinical Samples

Author: A Porter
A Prat
AA Alizadeh
BB Haab
BB Haab
Brian B. Haab
Brian P. Fallon
Bryan Curnutte
C Lombardi
Christopher J. Langmead
CY Wang
D Bergsma
EF Cook
G Kloppel
H Zhang
J Hoggatt
JA Koziol
JA Ludwig
JC Manimala
K Bouwman
K Maupin
K Partyka
Katie Partyka
Kevin A. Maupin
KS Goonetilleke
L Breiman
M Lukes
MH Gail
NB La Thangue
R Etzioni
R Orchekowski
Randall E. Brand
RO Dillman
S Baek
S Chen
S Dudoit
S Hakomori
S Varambally
Sunguk Choi
T Yue
T Yue
T Yue
TA Alonzo
Waibhav Tembe
William C. S. Cho
YM Wu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The development of accurate clinical biomarkers has been challenging in part due to the diversity between patients and diseases. One approach to account for the diversity is to use multiple markers to classify patients, based on the concept that each individual marker contributes information from its respective subclass of patients. Here we present a new strategy for developing biomarker panels that accounts for completely distinct patient subclasses. Marker State Space (MSS) defines "marker states" based on all possible patterns of high and low values among a panel of markers. Each marker state is defined as either a case state or a control state, and a sample is classified as case or control based on the state it occupies. MSS was used to define multi-marker panels that were robust in cross validation and training-set/test-set analyses and that yielded similar classification accuracy to several other classification algorithms. A three-marker panel for discriminating pancreatic cancer patients from control subjects revealed subclasses of patients based on distinct marker states. MSS provides a straightforward approach for modeling highly divergent subclasses of patients, which may be adaptable for diverse applications. © 2013 Fallon et al

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

FigShare