Search CORE

2,061 research outputs found

Rank discriminants for predicting phenotypes from RNA expression

Author: Afsari Bahman
Braga-Neto Ulisses M.
Geman Donald
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Texas A&M Repository

An entropy-based improved k-top scoring pairs (TSP) method for classifying human cancers

Author: Blanzieri Enrico
Liang Yanchun
Wang Shuqin
Zhou Chunbao
Publication venue: 'African Journals Online (AJOL)'
Publication date: 19/01/2016
Field of study

Classification and prediction of different cancers based on gene-expression profiles are important for cancer diagnosis, cancer treatment and medication discovery. However, most data in the gene expression profile are not able to make a contribution to cancer classification and prediction. Hence, it is important to find the key genes that are relevant. An entropy-based improved k-top scoring pairs (TSP) (Ik-TSP) method was presented in this study for the classification and prediction of human cancers based on gene-expression data. We compared Ik-TSP classifiers with 5 different machine learning methods and the k-TSP method based on 3 different feature selection methods on 9 binary class gene expression datasets and 10 multi-class gene expression datasets involving human cancers. Experimental results showed that the Ik-TSP method had higher accuracy. The experimental results also showed that the proposed method can effectively find genes that are important for distinguishing different cancer and cancer subtype.Key words: Cancer classification, gene expression, k-TSP, information entropy, gene selection

AJOL - African Journals Online

CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules

Author: Bertolazzi Paola
Cestarelli Valerio
FELICI GIOVANNI
FISCON GIULIA
Weitschek Emanuel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

Gene set based ensemble methods for cancer classification

Author: Duncan William Evans
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

Diagnosis of cancer very often depends on conclusions drawn after both clinical and microscopic examinations of tissues to study the manifestation of the disease in order to place tumors in known categories. One factor which determines the categorization of cancer is the tissue from which the tumor originates. Information gathered from clinical exams may be partial or not completely predictive of a specific category of cancer. Further complicating the problem of categorizing various tumors is that the histological classification of the cancer tissue and description of its course of development may be atypical. Gene expression data gleaned from micro-array analysis provides tremendous promise for more accurate cancer diagnosis. One hurdle in the classification of tumors based on gene expression data is that the data space is ultra-dimensional with relatively few points; that is, there are a small number of examples with a large number of genes. A second hurdle is expression bias caused by the correlation of genes. Analysis of subsets of genes, known as gene set analysis, provides a mechanism by which groups of differentially expressed genes can be identified. We propose an ensemble of classifiers whose base classifiers are ℓ1-regularized logistic regression models with restriction of the feature space to biologically relevant genes. Some researchers have already explored the use of ensemble classifiers to classify cancer but the effect of the underlying base classifiers in conjunction with biologically-derived gene sets on cancer classification has not been explored

Louisiana State University

Large-scale integration of cancer microarray data identifies a robust common cancer signature

Author: A Bhattacharjee
A Cromer
AC Tan
AI Su
AI Su
BJ Quade
CA Iacobuzio-Donahue
CD Logsdon
CF Basil
D Geman
D Talantov
DG Beer
DH Gutmann
Donald Geman
DR Rhodes
DR Rhodes
DS Rickman
E Dehan
E Segal
F Zhan
GJ Gordon
HF Frierson Jr.
I Yanai
J Luo
JB Welsh
JM Lancaster
JPT Higgins
L Dyrskjot
L Liotta
L Xu
Lei Xu
LL Hsiao
M Bittner
M Lenburg
MA Watson
ND Price
P Pavlidis
PJ Hoffman
R Shai
Raimond L Winslow
RC Bast Jr.
RS Stearman
S Michiels
S Ramaswamy
S Wachi
S Welle
SL Pomeroy
SM Dhanasekaran
SS Yoon
T Barrett
T Yagi
TJ Giordano
TR Golub
X Chen
X Yang
Y Hippo
Y Huang
YP Yu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background There is a continuing need to develop molecular diagnostic tools which complement histopathologic examination to increase the accuracy of cancer diagnosis. DNA microarrays provide a means for measuring gene expression signatures which can then be used as components of genomic-based diagnostic tests to determine the presence of cancer. Results In this study, we collect and integrate ~ 1500 microarray gene expression profiles from 26 published cancer data sets across 21 major human cancer types. We then apply a statistical method, referred to as the <it>T</it>op-<it>S</it>coring <it>P</it>air of <it>G</it>roups (TSPG) classifier, and a repeated random sampling strategy to the integrated training data sets and identify a common cancer signature consisting of 46 genes. These 46 genes are naturally divided into two distinct groups; those in one group are typically expressed less than those in the other group for cancer tissues. Given a new expression profile, the classifier discriminates cancer from normal tissues by ranking the expression values of the 46 genes in the cancer signature and comparing the average ranks of the two groups. This signature is then validated by applying this decision rule to independent test data. Conclusion By combining the TSPG method and repeated random sampling, a robust common cancer signature has been identified from large-scale microarray data integration. Upon further validation, this signature may be useful as a robust and objective diagnostic test for cancer.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Multivariate classification of gene expression microarray data

Author: Botella Pérez Cristina
Publication venue: 'Universitat Rovira I Virgili'
Publication date: 01/01/2010
Field of study

L'expressiódels gens obtinguts de l'anàliside microarrays s'utilitza en molts casos, per classificar les cèllules. En aquestatesi, unaversióprobabilística del mètodeDiscriminant Partial Least Squares (p-DPLS)s'utilitza per classificar les mostres de les expressions delsseus gens. p-DPLS esbasa en la regla de Bayes de la probabilitat a posteriori. Aquestsclassificadorssónforaçats a classficarsempre.Per superaraquestalimitaciós'haimplementatl'opció de rebuig.Aquestaopciópermetrebutjarlesmostresamb alt riscd'errors de classificació (és a dir, mostresambigüesi outliers).Aquestaopció de rebuigcombinacriterisbasats en els residuals x, el leverage ielsvalorspredits. A més,esdesenvolupa un mètode de selecció de variables per triarels gens mésrellevants, jaque la majoriadels gens analitzatsamb un microarraysónirrellevants per al propòsit particular de classificacióI podenconfondre el classificador. Finalment, el DPLSs'estenen a la classificació multi-classemitjançant la combinació de PLS ambl'anàlisidiscriminant lineal

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Repositori Institucional URV

Identifying Tightly Regulated and Variably Expressed Networks by Differential Rank Conservation (DIRAC)

Author: A Subramanian
A Subramanian
AC Tan
C Auffray
CS Moreno
D Geman
D Nam
Doheon Lee
Donald Geman
DW Parsons
E Lee
G Kroemer
H Land
HJ Tagnon
HY Chuang
IK Mellinghoff
JA Trapani
James A. Eddy
JT Leek
K Shimada
L Hood
Leroy Hood
M Karin
M Raponi
MA Kuriakose
MM Ryan
Nathan D. Price
ND Price
PP Hsu
R McLendon
RJ Shaw
RJ Shaw
RR Weichselbaum
S Jones
SA Armstrong
SW Lowe
T Joachims
TR Golub
UR Chandran
VN Vapnik
XJ Ma
YP Yu
Z Yao
Publication venue: Public Library of Science
Publication date: 01/05/2010
Field of study

A powerful way to separate signal from noise in biology is to convert the molecular data from individual genes or proteins into an analysis of comparative biological network behaviors. One of the limitations of previous network analyses is that they do not take into account the combinatorial nature of gene interactions within the network. We report here a new technique, Differential Rank Conservation (DIRAC), which permits one to assess these combinatorial interactions to quantify various biological pathways or networks in a comparative sense, and to determine how they change in different individuals experiencing the same disease process. This approach is based on the relative expression values of participating genes—i.e., the ordering of expression within network profiles. DIRAC provides quantitative measures of how network rankings differ either among networks for a selected phenotype or among phenotypes for a selected network. We examined disease phenotypes including cancer subtypes and neurological disorders and identified networks that are tightly regulated, as defined by high conservation of transcript ordering. Interestingly, we observed a strong trend to looser network regulation in more malignant phenotypes and later stages of disease. At a sample level, DIRAC can detect a change in ranking between phenotypes for any selected network. Variably expressed networks represent statistically robust differences between disease states and serve as signatures for accurate molecular classification, validating the information about expression patterns captured by DIRAC. Importantly, DIRAC can be applied not only to transcriptomic data, but to any ordinal data type

Crossref

Directory of Open Access Journals

PubMed Central

Discriminating Origin Tissues of Tumor Cell Lines by Methylation Signatures and Dys-Methylated Rules

Author: Cai Yu Dong
Chen Lei
Feng Kaiyan
Hu Bin
Huang Tao
Li Jianhao
Niu Zhibin
Zeng Tao
Zhang Shiqi
Zhang Yu Hang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Copenhagen University Research Information System