20 research outputs found
Rank discriminants for predicting phenotypes from RNA expression
Statistical methods for analyzing large-scale biomolecular data are
commonplace in computational biology. A notable example is phenotype prediction
from gene expression data, for instance, detecting human cancers,
differentiating subtypes and predicting clinical outcomes. Still, clinical
applications remain scarce. One reason is that the complexity of the decision
rules that emerge from standard statistical learning impedes biological
understanding, in particular, any mechanistic interpretation. Here we explore
decision rules for binary classification utilizing only the ordering of
expression among several genes; the basic building blocks are then two-gene
expression comparisons. The simplest example, just one comparison, is the TSP
classifier, which has appeared in a variety of cancer-related discovery
studies. Decision rules based on multiple comparisons can better accommodate
class heterogeneity, and thereby increase accuracy, and might provide a link
with biological mechanism. We consider a general framework ("rank-in-context")
for designing discriminant functions, including a data-driven selection of the
number and identity of the genes in the support ("context"). We then specialize
to two examples: voting among several pairs and comparing the median expression
in two groups of genes. Comprehensive experiments assess accuracy relative to
other, more complex, methods, and reinforce earlier observations that simple
classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Inferring causal molecular networks: empirical assessment through a community-based effort.
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense
Inferring causal molecular networks: empirical assessment through a community-based effort
Inferring molecular networks is a central challenge in computational biology. However, it has remained unclear whether causal, rather than merely correlational, relationships can be effectively inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge that focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results constitute the most comprehensive assessment of causal network inference in a mammalian setting carried out to date and suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess the causal validity of inferred molecular networks
Inferring causal molecular networks: empirical assessment through a community-based effort
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense
The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations
<p>Abstract</p> <p>Background</p> <p>A major challenge in computational biology is to extract knowledge about the genetic nature of disease from high-throughput data. However, an important obstacle to both biological understanding and clinical applications is the "black box" nature of the decision rules provided by most machine learning approaches, which usually involve many genes combined in a highly complex fashion. Achieving biologically relevant results argues for a different strategy. A promising alternative is to base prediction entirely upon the relative expression ordering of a small number of genes.</p> <p>Results</p> <p>We present a three-gene version of "relative expression analysis" (<it>RXA</it>), a rigorous and systematic comparison with earlier approaches in a variety of cancer studies, a clinically relevant application to predicting germline BRCA1 mutations in breast cancer and a cross-study validation for predicting ER status. In the BRCA1 study, <it>RXA </it>yields high accuracy with a simple decision rule: in tumors carrying mutations, the expression of a "reference gene" falls between the expression of two differentially expressed genes, <it>PPP1CB </it>and <it>RNF14</it>. An analysis of the protein-protein interactions among the triplet of genes and <it>BRCA</it>1 suggests that the classifier has a biological foundation.</p> <p>Conclusion</p> <p><it>RXA </it>has the potential to identify genomic "marker interactions" with plausible biological interpretation and direct clinical applicability. It provides a general framework for understanding the roles of the genes involved in decision rules, as illustrated for the difficult and clinically relevant problem of identifying <it>BRCA</it>1 mutation carriers.</p
Recommended from our members
A Novel Functional Splice Variant of AKT3 Defined by Analysis of Alternative Splice Expression in HPV-Positive Oropharyngeal Cancers
The incidence of HPV-related oropharyngeal squamous cell carcinoma (OPSCC) has increased more than 200% in the past 20 years. Recent genetic sequencing efforts have elucidated relevant genes in head and neck cancer, but HPV-related tumors have consistently shown few DNA mutations. In this study, we sought to analyze alternative splicing events (ASE) that could alter gene function independent of mutations. To identify ASE unique to HPV-related tumors, RNA sequencing was performed on 46 HPV-positive OPSCC and 25 normal tissue samples. A novel algorithm using outlier statistics on RNA-sequencing junction expression identified 109 splicing events, which were confirmed in a validation set from The Cancer Genome Atlas. Because the most common type of splicing event identified was an alternative start site (39%), MBD-seq genome-wide CpG methylation data were analyzed for methylation alterations at promoter regions. ASE in six genes showed significant negative correlation between promoter methylation and expression of an alternative transcriptional start site, including AKT3 The novel AKT3 transcriptional variant and methylation changes were confirmed using qRT-PCR and qMSP methods. In vitro silencing of the novel AKT3 variant resulted in significant growth inhibition of multiple head and neck cell lines, an effect not observed with wild-type AKT3 knockdown. Analysis of ASE in HPV-related OPSCC identified multiple alterations likely involved in carcinogenesis, including a novel, functionally active transcriptional variant of AKT3 Our data indicate that ASEs represent a significant mechanism of oncogenesis with untapped potential for understanding complex genetic changes that result in the development of cancer. Cancer Res; 77(19); 5248-58. ©2017 AACR