Search CORE

2,465 research outputs found

Kernel-based distance metric learning for microarray data classification

Author: Chen Xue-wen
Xiong Huilin
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The most fundamental task using gene expression data in clinical oncology is to classify tissue samples according to their gene expression levels. Compared with traditional pattern classifications, gene expression-based data classification is typically characterized by high dimensionality and small sample size, which make the task quite challenging. RESULTS: In this paper, we present a modified K-nearest-neighbor (KNN) scheme, which is based on learning an adaptive distance metric in the data space, for cancer classification using microarray data. The distance metric, derived from the procedure of a data-dependent kernel optimization, can substantially increase the class separability of the data and, consequently, lead to a significant improvement in the performance of the KNN classifier. Intensive experiments show that the performance of the proposed kernel-based KNN scheme is competitive to those of some sophisticated classifiers such as support vector machines (SVMs) and the uncorrelated linear discriminant analysis (ULDA) in classifying the gene expression data. CONCLUSION: A novel distance metric is developed and incorporated into the KNN scheme for cancer classification. This metric can substantially increase the class separability of the data in the feature space and, hence, lead to a significant improvement in the performance of the KNN classifier

Springer - Publisher Connector

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

Selection of biologically relevant genes with a wrapper stochastic algorithm

Author: Besse Philippe
Gadat Sébastien
Gonçalves Olivier
Lê Cao Kim-Anh
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2007
Field of study

International audienceWe investigate an important issue of a meta-algorithm for selecting variables in the framework of microarray data. This wrapper method starts from any classification algorithm and weights each variable (i.e. gene) relative to its efficiency for classification. An optimization procedure is then inferred which exhibits important genes for the studied biological process. Theory and application with the SVM classifier were presented in Gadat and Younes, 2007 and we extend this method with CART. The classification error rates are computed on three famous public databases (Leukemia, Colon and Prostate) and compared with those from other wrapper methods (RFE, lo norm SVM, Random Forests). This allows the assessment of the statistical relevance of the proposed algorithm. Furthermore, a biological interpretation with the Ingenuity Pathway Analysis software outputs clearly shows that the gene selections from the different wrapper methods raise very relevant biological information, compared to a classical filter gene selection with T-test

Scientific Publications of the University of Toulouse II Le Mirail

HAL Clermont Université

HAL-INSA Toulouse

Evolutionary approaches for feature selection in biological data

Author: Dang Vinh Q.
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2014
Field of study

Data mining techniques have been used widely in many areas such as business, science, engineering and medicine. The techniques allow a vast amount of data to be explored in order to extract useful information from the data. One of the foci in the health area is finding interesting biomarkers from biomedical data. Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size. Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values. While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information. The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers). The project aims to generate good predictive models for classifying diseased samples from control

Research Online @ ECU

Supervised clustering of genes

Author: Bühlmann Peter
Dettling Marcel
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: We focus on microarray data where experiments monitor gene expression in different tissues and where each experiment is equipped with an additional response variable such as a cancer type. Although the number of measured genes is in the thousands, it is assumed that only a few marker components of gene subsets determine the type of a tissue. Here we present a new method for finding such groups of genes by directly incorporating the response variables into the grouping process, yielding a supervised clustering algorithm for genes. RESULTS: An empirical study on eight publicly available microarray datasets shows that our algorithm identifies gene clusters with excellent predictive potential, often superior to classification with state-of-the-art methods based on single genes. Permutation tests and bootstrapping provide evidence that the output is reasonably stable and more than a noise artifact. CONCLUSIONS: In contrast to other methods such as hierarchical clustering, our algorithm identifies several gene clusters whose expression levels clearly distinguish the different tissue types. The identification of such gene clusters is potentially useful for medical diagnostics and may at the same time reveal insights into functional genomics

Repository for Publications and Research Data

PubMed Central

ZHAW digitalcollection

Role of Artificial Intelligence in Radiogenomics for Cancers in the Era of Precision Medicine

Author: Bhattacharya P.
Das S.
Fouda M. M.
Gupta N.
Jena B.
Kalra M.
Nath T.
Pareek G.
Paul S.
Saba L.
Sarmah D.
Saxena S.
Suri J. S.
Publication venue
Publication date: 01/01/2022
Field of study

Radiogenomics, a combination of “Radiomics” and “Genomics,” using Artificial Intelligence (AI) has recently emerged as the state-of-the-art science in precision medicine, especially in oncology care. Radiogenomics syndicates large-scale quantifiable data extracted from radiological medical images enveloped with personalized genomic phenotypes. It fabricates a prediction model through various AI methods to stratify the risk of patients, monitor therapeutic approaches, and assess clinical outcomes. It has recently shown tremendous achievements in prognosis, treatment planning, survival prediction, heterogeneity analysis, reoccurrence, and progression-free survival for human cancer study. Although AI has shown immense performance in oncology care in various clinical aspects, it has several challenges and limitations. The proposed review provides an overview of radiogenomics with the viewpoints on the role of AI in terms of its promises for computa-tional as well as oncological aspects and offers achievements and opportunities in the era of precision medicine. The review also presents various recommendations to diminish these obstacles

Archivio istituzionale della ricerca - Università di Cagliari

Gene selection and classification of microarray data using random forest

Author: Alvarez de Andrés Sara
Díaz-Uriarte Ramón
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Selection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection. RESULTS: We investigate the use of random forest for classification of microarray data (including multi-class problems) and propose a new method of gene selection in classification problems based on random forest. Using simulated and nine microarray data sets we show that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy. CONCLUSION: Because of its performance and features, random forest and gene selection using random forest should probably become part of the "standard tool-box" of methods for class prediction and gene selection with microarray data

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Biblos-e Archivo

Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers

Author: Adamec Jiri
Biegert Greyson
Helikar Tomáš
Mohammed Akram
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 21/09/2017
Field of study

Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent “OMICS” studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29% and 100%. Given a sample of non-specific tissue type, the multitissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89%. Given a sample of non-specific tissue type, the multi-tissue multiclass model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43%. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35%. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissuespecific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes

DigitalCommons@University of Nebraska

Very Important Pool (VIP) genes – an application for microarray-based molecular signatures

Author: A Ben-Dor
A Bhattacharjee
A Butte
A Rosenwald
AK Jain
AL Bluma
B Liu
C Ambroise
C Ding
C Lai
D Singh
DG Beer
DJ Lockhart
EJ Yeoh
EK Tang
GJ Gordon
H Hackl
HH Zhang
Hong Fang
Huixiao Hong
IM Gana Dresen
InfoMetrix
J Dopazoa
J Gould
J Quackenbush
J Quackenbush
JG Zhang
JJ Chen
KE Lee
L Brehelin
L Breiman
L Ein-Dor
L Li
L Shi
L Shi
L Shi
L Wang
Leming Shi
LF Wessels
LJ van 't Veer
M Dettling
M Schena
MA Shipp
R Diaz-Uriarte
R Simon
Roger Perkins
S Dudoit
S Michiels
S Mukherjee
S Wold
SE Jarvis
SJ Raudys
SL Pomeroy
U Alon
U Lutz
VN Vapnik
W Jiang
Weida Tong
WJ Fu
X Chen
Y Peng
Y Wang
Z Su
Zhenqiang Su
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics. Results A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples. Conclusion The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central