19 research outputs found
Emergence of pathway-level composite biomarkers from converging gene set signals of heterogeneous transcriptomic responses
Recent precision medicine initiatives have led to the expectation of improved clinical decision-making anchored in genomic data science. However, over the last decade, only a handful of new single-gene product biomarkers have been translated to clinical practice (FDA approved) in spite of considerable discovery efforts deployed and a plethora of transcriptomes available in the Gene Expression Omnibus. With this modest outcome of current approaches in mind, we developed a pilot simulation study to demonstrate the untapped benefits of developing disease detection methods for cases where the true signal lies at the pathway level, even if the pathway's gene expression alterations may be heterogeneous across patients. In other words, we relaxed the cross-patient homogeneity assumption from the transcript level (cohort assumptions of deregulated gene expression) to the pathway level (assumptions of deregulated pathway expression). Furthermore, we have expanded previous single-subject (SS) methods into cohort analyses to illustrate the benefit of accounting for an individual's variability in cohort scenarios. We compare SS and cohort-based (CB) techniques under 54 distinct scenarios, each with 1,000 simulations, to demonstrate that the emergence of a pathway-level signal occurs through the summative effect of its altered gene expression, heterogeneous across patients. Studied variables include pathway gene set size, fraction of expressed gene responsive within gene set, fraction of expressed gene responsive up- vs down-regulated, and cohort size. We demonstrated that our SS approach was uniquely suited to detect signals in heterogeneous populations in which individuals have varying levels of baseline risks that are simultaneously confounded by patient-specific "genome -by-environment" interactions (GxE). Area under the precision-recall curve of the SS approach far surpassed that of the CB (1st quartile, median, 3 rd quartile: SS = 0.94, 0.96, 0.99; CB= 0.50, 0.52, 0.65). We conclude that single-subject pathway detection methods are uniquely suited for consistently detecting pathway dysregulation by the inclusion of a patient's individual variability.University of Arizona Health Sciences CB2, the BIOS Institute; NIH [U01AI122275, HL132532, CA023074, 1UG3OD023171, 1R01AG053589-01A1, 1S10RR029030]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Recommended from our members
Developing a 'personalome' for precision medicine: emerging methods that compute interpretable effect sizes from single-subject transcriptomes
The development of computational methods capable of analyzing -omics data at the individual level is critical for the success of precision medicine. Although unprecedented opportunities now exist to gather data on an individual's -omics profile (personalome'), interpreting and extracting meaningful information from single-subject -omics remain underdeveloped, particularly for quantitative non-sequence measurements, including complete transcriptome or proteome expression and metabolite abundance. Conventional bioinformatics approaches have largely been designed for making population-level inferences about average' disease processes; thus, they may not adequately capture and describe individual variability. Novel approaches intended to exploit a variety of -omics data are required for identifying individualized signals for meaningful interpretation. In this review-intended for biomedical researchers, computational biologists and bioinformaticians-we survey emerging computational and translational informatics methods capable of constructing a single subject's personalome' for predicting clinical outcomes or therapeutic responses, with an emphasis on methods that provide interpretable readouts. Key points: (i) the single-subject analytics of the transcriptome shows the greatest development to date and, (ii) the methods were all validated in simulations, cross-validations or independent retrospective data sets. This survey uncovers a growing field that offers numerous opportunities for the development of novel validation methods and opens the door for future studies focusing on the interpretation of comprehensive personalomes' through the integration of multiple -omics, providing valuable insights into individual patient outcomes and treatments.National Institute of Health (NIH)/Office of the Director Precision Medicine Initiative [1UG3OD023171-01]; Precision Medicine Initiative of the Center for Biomedical Informatics and Biostatistics of the University of Arizona Health Sciences; NIH/National Heart, Lung, and Blood Institute [HL126609-01, HL132523, U01 HL125208]; NIH/National Cancer Institute [P30CA023074, 1R01CA190696-01]; NIH/National Institute of Allergy and Infectious Diseases [U01AI122275-01]Open access articleThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
A Single-Subject Method to Detect Pathways Enriched With Alternatively Spliced Genes
RNA-Sequencing data offers an opportunity to enable precision medicine, but most methods rely on gene expression alone. To date, no methodology exists to identify and interpret alternative splicing patterns within pathways for an individual patient. This study develops methodology and conducts computational experiments to test the hypothesis that pathway aggregation of subject-specific alternatively spliced genes (ASGs) can inform upon disease mechanisms and predict survival. We propose the N-of-1-pathways Alternatively Spliced (N1PAS) method that takes an individual patientās paired-sample RNA-Seq isoform expression data (e.g., tumor vs. non-tumor, before-treatment vs. during-therapy) and pathway annotations as inputs. N1PAS quantifies the degree of alternative splicing via Hellinger distances followed by two-stage clustering to determine pathway enrichment. We provide a clinically relevant āodds ratioā along with statistical significance to quantify pathway enrichment. We validate our method in clinical samples and find that our method selects relevant pathways (p < 0.05 in 4/6 data sets). Extensive Monte Carlo studies show N1PAS powerfully detects pathway enrichment of ASGs while adequately controlling false discovery rates. Importantly, our studies also unveil highly heterogeneous single-subject alternative splicing patterns that cohort-based approaches overlook. Finally, we apply our patient-specific results to predict cancer survival (FDR < 20%) while providing diagnostics in pursuit of translating transcriptome data into clinically actionable information. Software available at https://github.com/grizant/n1pas/tree/master
Simulating High-Dimensional Multivariate Data using the bigsimr R Package
It is critical to accurately simulate data when employing Monte Carlo
techniques and evaluating statistical methodology. Measurements are often
correlated and high dimensional in this era of big data, such as data obtained
in high-throughput biomedical experiments. Due to the computational complexity
and a lack of user-friendly software available to simulate these massive
multivariate constructions, researchers resort to simulation designs that posit
independence or perform arbitrary data transformations. To close this gap, we
developed the Bigsimr Julia package with R and Python interfaces. This paper
focuses on the R interface. These packages empower high-dimensional random
vector simulation with arbitrary marginal distributions and dependency via a
Pearson, Spearman, or Kendall correlation matrix. bigsimr contains
high-performance features, including multi-core and
graphical-processing-unit-accelerated algorithms to estimate correlation and
compute the nearest correlation matrix. Monte Carlo studies quantify the
accuracy and scalability of our approach, up to . We describe example
workflows and apply to a high-dimensional data set -- RNA-sequencing data
obtained from breast cancer tumor samples.Comment: 22 pages, 10 figures,
https://cran.r-project.org/web/packages/bigsimr/index.htm
Testing for differentially expressed genetic pathways with single-subject N-of-1 data in the presence of inter-gene correlation.
Modern precision medicine increasingly relies on molecular data analytics, wherein development of interpretable single-subject ("N-of-1") signals is a challenging goal. A previously developed global framework, N-of-1- pathways, employs single-subject gene expression data to identify differentially expressed gene set pathways in an individual patient. Unfortunately, the limited amount of data within the single-subject, N-of-1 setting makes construction of suitable statistical inferences for identifying differentially expressed gene set pathways difficult, especially when non-trivial inter-gene correlation is present. We propose a method that exploits external information on gene expression correlations to cluster positively co-expressed genes within pathways, then assesses differential expression across the clusters within a pathway. A simulation study illustrates that the cluster-based approach exhibits satisfactory false-positive error control and reasonable power to detect differentially expressed gene set pathways. An example with a single N-of-1 patient's triple negative breast cancer data illustrates use of the methodology.U.S. National Science Foundation [1228509]; U.S. National Institutes of Health [R03ES027394]This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Recommended from our members
A Single-Subject Method to Detect Pathways Enriched With Alternatively Spliced Genes
Participants in White et al.ās (1) study performed a semantic categorization task while viewing pairs of words presented simultaneously to the right and left of fixation. On each trial, participants viewed two briefly displayed words (nouns), one displayed to the left of fixation, and the other to the right of fixation, and categorized one of the words as either living or nonliving. In a āfocal cueā condition, participants performed the task on either the left or the right word, according to a precue. In a ādistributed cueā condition, participants paid attention to both words and subsequently reported the semantic category of one of the words, but without knowing which in advance. The authors, in a previous behavioral study (2), used a similar task to show that even highly skilled readers are able to recognize only one word at a time. In the current study, participants performed the task during fMRI scanning, which measures blood oxygen level-dependent (BOLD) signals with millimeter-level spatial resolution
Statistical Comparison and Assessment of Four Fire Emissions Inventories for 2013 and a Large Wildfire in the Western United States
Wildland fires produce smoke plumes that impact air quality and human health. To understand the effects of wildland fire smoke on humans, the amount and composition of the smoke plume must be quantified. Using a fire emissions inventory is one way to determine the emissions rate and composition of smoke plumes from individual fires. There are multiple fire emissions inventories, and each uses a different method to estimate emissions. This paper presents a comparison of four emissions inventories and their products: Fire INventory from NCAR (FINN version 1.5), Global Fire Emissions Database (GFED version 4s), Missoula Fire Labs Emissions Inventory (MFLEI (250 m) and MFLEI (10 km) products), and Wildland Fire Emissions Inventory System (WFEIS (MODIS) and WFEIS (MTBS) products). The outputs from these inventories are compared directly. Because there are no validation datasets for fire emissions, the outlying points from the Bayesian models developed for each inventory were compared with visible images and fire radiative power (FRP) data from satellite remote sensing. This comparison provides a framework to check fire emissions inventory data against additional data by providing a set of days to investigate closely. Results indicate that FINN and GFED likely underestimate emissions, while the MFLEI products likely overestimate emissions. No fire emissions inventory matched the temporal distribution of emissions from an external FRP dataset. A discussion of the differences impacting the emissions estimates from the four fire emissions inventories is provided, including a qualitative comparison of the methods and inputs used by each inventory and the associated strengths and limitations
Adjusting statistical benchmark risk analysis to account for non-spatial autocorrelation, with application to natural hazard risk assessment
We develop and study a quantitative, interdisciplinary strategy for conducting statistical risk analyses within the ābenchmark riskā paradigm of contemporary risk assessment when potential autocorrelation exists among sample units. We use the methodology to explore information on vulnerability to natural hazards across 3108 counties in the conterminous 48 US states, applying a place-based resilience index to an existing knowledgebase of hazardous incidents and related human casualties. An extension of a centered autologistic regression model is applied to relate local, county-level vulnerability to hazardous outcomes. Adjustments for autocorrelation embedded in the resiliency information are applied via a novel, non-spatial neighborhood structure. Statistical risk-benchmarking techniques are then incorporated into the modeling framework, wherein levels of high and low vulnerability to hazards are identified. Ā© 2021 Informa UK Limited, trading as Taylor & Francis Group.National Institute of Environmental Health Sciences12 month embargo; first published online 1 April 2021This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival
Poster exhibited at GPSC Student Showcase, February 24th, 2016, University of Arizona.Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We previously employed a metric that could prioritize the statistical significance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g. the equivalent to a gene expression fold-change). Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance (MD) to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples MD (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simulations), while not inflating false-positive rate using a study with biological replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant and are predictive of breast cancer survival (P<0.05, nĀ¼80 invasive car- cinoma; TCGA RNA-sequences). Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patientās transcriptome. These pathways offer the opportunities for deriving clinically actionable decisions that have the potential to complement the clinical interpret- ability of personal polymorphisms obtained from DNA acquired or inherited polymorphisms and mutations. In addition, it offers an opportunity for applicability to diseases in which DNA changes may not be relevant, and thus expand the āinterpretable āomicsā of single subjects (e.g. personalome).This item is part of the GPSC Student Showcase collection. For more information about the Student Showcase, please email the GPSC (Graduate and Professional Student Council) at [email protected]