11 research outputs found

    Finding human genetic variation in whole genome expression data with applications for “missing” heritability: The GWCoGAPS algorithm, the PatternMarkers statistic, and the ProjectoR package

    Get PDF
    Starting from a single fertilized egg, the compendium of human cells is generated via stochastic perturbations of earlier generations. Concurrently, canalization of developmental pathways limits the type and degree of variation to ensure viability; thus, it is unsurprising that deviations early in life have been linked to late manifesting diseases. Human pluripotent stem cells (hPSCs) are a highly robust and uniquely human experimental system in which to model the sources and consequences of this variability. Further, variation in hPSCs’ transcriptomes has been directly linked to both genomic background and biases in differentiation efficiency. Taking advantage of this link between genomic background and developmental phenotypes, we developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian non-negative matrix factorization (NMF), to find conserved transcriptional signatures representative of the functional effect of human genetic variation. Time course RNA-seq data obtained from three human embryonic stem cells (ESC) and three human induced pluripotent stem cells (IPSC) in three different experimental conditions was analyzed. GWCoGAPS distinguished shared developmental trajectories from unique transcriptional signatures of each of the cell lines. Further analysis of these “identity” signatures found they were predictive of lineage biases during neuronal differentiation. Additionally, lineage biases were consistent with early differences in morphogenetic phenotypes within monolayer culture, thus, linking transcriptional genomic signatures to stable quantifiable cellular features. To test whether the cell line signatures were genome specific, we next developed the projectoR algorithm to assess a given signatures robustness in independent data sets. By using the identity signatures as inputs to projectoR, we were able to identify samples from the same donor genome in datasets from multiple tissues and across technical platforms, including RNA-seq results from post-mortem brain, micro arrayed embryoid bodies, and publicly available datasets. The identification of signatures that define the functional rather than physical background of an individual’s genome has the potential to profoundly influence our view of human variation and disease

    Universal prediction of cell-cycle position using transfer learning.

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked DownloadBackground: The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. Results: Here, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell-cycle embedding using a fixed reference dataset and project new data into this reference embedding, an approach that overcomes key limitations of learning a dataset-dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species, and even sequencing assays. Conclusions: Tricycle generalizes across datasets and is highly scalable and applicable to atlas-level single-cell RNA-seq data. Keywords: Cell cycle; Single-cell RNA-sequencing; Transfer learning.Chan Zuckerberg Initiative DAF Silicon Valley Community Foundation United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Institute of General Medical Sciences (NIGMS) Appeared in source as:National Institute of General Medical Sciences of the National Institutes of Health National Science Foundation (NSF) National Institute of Agin Maryland Stem Cell Research Foundation Kavli Neurodiscovery Institute Johns Hopkins Provost Award Program BRAIN Initiative United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Institute of Neurological Disorders & Stroke (NINDS) Appeared in source as:National Institute of Neurological Disorder

    Early changes in immune cell subsets with corticosteroids in patients with solid tumors: implications for COVID-19 management

    No full text
    Background The risk–benefit calculation for corticosteroid administration in the management of COVID-19 is complex and urgently requires data to inform the decision. The neutrophil-to-lymphocyte ratio (NLR) is a marker of systemic inflammation associated with poor prognosis in both COVID-19 and cancer. Investigating NLR as an inflammatory marker and lymphocyte levels as a critical component of antiviral immunity may inform the dilemma of reducing toxic hyperinflammation while still maintaining effective antiviral responses.Methods We performed a retrospective analysis of NLR, absolute neutrophil counts (ANCs) and absolute lymphocyte counts (ALCs) in patients with cancer enrolled in immunotherapy trials who received moderate-dose to high-dose corticosteroids. We compared paired presteroid and available poststeroid initiation values daily during week 1 and again on day 14 using the Wilcoxon signed-rank test. Associated immune subsets by flow cytometry were included where available.Results Patients (n=48) with a variety of solid tumors received prednisone, methylprednisolone, or dexamethasone alone or in combination in doses ranging from 20 to 190 mg/24 hours (prednisone equivalent). The median NLR prior to steroid administration was elevated at 5.0 (range: 0.9–61.2). The corresponding median ANC was 5.1 K/µL (range: 2.03–22.31 K/µL) and ALC was 1.03 K/µL (0.15–2.57 K/µL). One day after steroid administration, there was a significant transient drop in median ALC to 0.54 K/µL (p=0.0243), driving an increase in NLR (median 10.8, p=0.0306). Relative lymphopenia persisted through day 14 but was no longer statistically significant. ANC increased steadily over time, becoming significant at day 4 (median: 7.31 K/µL, p=0.0171) and remaining significantly elevated through day 14. NLR was consistently elevated after steroid initiation, significantly at days 1, 7 (median: 8.2, p=0.0272), and 14 (median: 15.0, p=0.0018). Flow cytometry data from 11 patients showed significant decreases in activated CD4 cells and effector memory CD8 cells.Conclusions The early drop in ALC with persistent lymphopenia as well as the prolonged ANC elevation seen in response to corticosteroid administration are similar to trends associated with increased mortality in several coronavirus studies to include the current SARS-CoV-2 pandemic. The affected subsets are essential for effective antiviral immunity. This may have implications for glucocorticoid therapy for COVID-19

    Integrated immunological analysis of a successful conversion of locally advanced hepatocellular carcinoma to resectability with neoadjuvant therapy

    No full text
    Hepatocellular carcinoma (HCC) is the fourth leading cause of cancer death worldwide with a minority of patients being diagnosed early enough for curative-intent interventions. We report the first use of preoperative cabozantinib plus nivolumab to successfully downstage what presented as unresectable HCC as part of an ongoing phase 1b study. Preoperative treatment with cabozantinib and nivolumab led to >99% reduction in alpha-fetoprotein, −37.3% radiographic reduction by RECIST 1.1 and a near complete pathologic response (80% to 100% necrosis). An integrated immunological analysis was performed on the post-treatment surgical tumor sample and matched pre-treatment and post-treatment peripheral blood samples with high-dimensional imaging and cytometry techniques. Bayesian non-negative matrix factorization (CoGAPS, Coordinated Gene Activity in Pattern Sets) and self-organizing map (FlowSOM) algorithms were used to distinguish changes in functional markers across cellular neighborhoods in the single cell data sets. Brisk immunological infiltration into the tumor microenvironment was observed in non-random, organized cellular neighborhoods. Systemically, combination therapy led to marked promotion of effector cytotoxic T cells and effector memory helper T cells. Natural killer cells also increased with therapy. The patient remains without disease recurrence and with a normal alpha-fetoprotein approximately 2 years from presentation. Our study provides proof-of-concept that borderline resectable or locally advanced HCC warrants consideration of downstaging with effective neoadjuvant systemic therapy for subsequent curative resection

    Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species

    No full text
    Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity

    CancerInSilico: An R/Bioconductor package for combining mathematical and statistical modeling to simulate time course bulk and single cell gene expression data in cancer.

    No full text
    Bioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package, CancerInSilico, to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. This package contains a general R infrastructure for running cell-based models and simulating gene expression data based on the model states. We show how to use this package to simulate a gene expression data set and consequently benchmark analysis methods on this data set with a known ground truth. The package is freely available via Bioconductor: http://bioconductor.org/packages/CancerInSilico/

    Integrated time course omics analysis distinguishes immediate therapeutic response from acquired resistance.

    No full text
    BACKGROUND: Targeted therapies specifically act by blocking the activity of proteins that are encoded by genes critical for tumorigenesis. However, most cancers acquire resistance and long-term disease remission is rarely observed. Understanding the time course of molecular changes responsible for the development of acquired resistance could enable optimization of patients' treatment options. Clinically, acquired therapeutic resistance can only be studied at a single time point in resistant tumors. METHODS: To determine the dynamics of these molecular changes, we obtained high throughput omics data (RNA-sequencing and DNA methylation) weekly during the development of cetuximab resistance in a head and neck cancer in vitro model. The CoGAPS unsupervised algorithm was used to determine the dynamics of the molecular changes associated with resistance during the time course of resistance development. RESULTS: CoGAPS was used to quantify the evolving transcriptional and epigenetic changes. Applying a PatternMarker statistic to the results from CoGAPS enabled novel heatmap-based visualization of the dynamics in these time course omics data. We demonstrate that transcriptional changes result from immediate therapeutic response or resistance, whereas epigenetic alterations only occur with resistance. Integrated analysis demonstrates delayed onset of changes in DNA methylation relative to transcription, suggesting that resistance is stabilized epigenetically. CONCLUSIONS: Genes with epigenetic alterations associated with resistance that have concordant expression changes are hypothesized to stabilize the resistant phenotype. These genes include FGFR1, which was associated with EGFR inhibitors resistance previously. Thus, integrated omics analysis distinguishes the timing of molecular drivers of resistance. This understanding of the time course progression of molecular changes in acquired resistance is important for the development of alternative treatment strategies that would introduce appropriate selection of new drugs to treat cancer before the resistant phenotype develops
    corecore