14 research outputs found

    Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

    Full text link
    We present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow) that learns stochastic, continuous population dynamics from static snapshot samples taken at sporadic timepoints. MIOFlow combines dynamic models, manifold learning, and optimal transport by training neural ordinary differential equations (Neural ODE) to interpolate between static population snapshots as penalized by optimal transport with manifold ground distance. Further, we ensure that the flow follows the geometry by operating in the latent space of an autoencoder that we call a geodesic autoencoder (GAE). In GAE the latent space distance between points is regularized to match a novel multiscale geodesic distance on the data manifold that we define. We show that this method is superior to normalizing flows, Schr\"odinger bridges and other generative models that are designed to flow from noise to data in terms of interpolating between populations. Theoretically, we link these trajectories with dynamic optimal transport. We evaluate our method on simulated data with bifurcations and merges, as well as scRNA-seq data from embryoid body differentiation, and acute myeloid leukemia treatment.Comment: Presented at NeurIPS 2022, 24 pages, 7 tables, 14 figure

    Coarse Graining of Data via Inhomogeneous Diffusion Condensation

    Full text link
    Big data often has emergent structure that exists at multiple levels of abstraction, which are useful for characterizing complex interactions and dynamics of the observations. Here, we consider multiple levels of abstraction via a multiresolution geometry of data points at different granularities. To construct this geometry we define a time-inhomogeneous diffusion process that effectively condenses data points together to uncover nested groupings at larger and larger granularities. This inhomogeneous process creates a deep cascade of intrinsic low pass filters on the data affinity graph that are applied in sequence to gradually eliminate local variability while adjusting the learned data geometry to increasingly coarser resolutions. We provide visualizations to exhibit our method as a continuously-hierarchical clustering with directions of eliminated variation highlighted at each step. The utility of our algorithm is demonstrated via neuronal data condensation, where the constructed multiresolution data geometry uncovers the organization, grouping, and connectivity between neurons.Comment: 14 pages, 7 figure

    linc-mipep and linc-wrb encode micropeptides that regulate chromatin accessibility in vertebrate-specific neural cells

    Get PDF
    Thousands of long intergenic non-coding RNAs (lincRNAs) are transcribed throughout the vertebrate genome. A subset of lincRNAs enriched in developing brains have recently been found to contain cryptic open-reading frames and are speculated to encode micropeptides. However, systematic identification and functional assessment of these transcripts have been hindered by technical challenges caused by their small size. Here, we show that two putative lincRNAs (linc-mipep, also called lnc-rps25, and linc-wrb) encode micropeptides with homology to the vertebrate-specific chromatin architectural protein, Hmgn1, and demonstrate that they are required for development of vertebrate-specific brain cell types. Specifically, we show that NMDA receptor-mediated pathways are dysregulated in zebrafish lacking these micropeptides and that their loss preferentially alters the gene regulatory networks that establish cerebellar cells and oligodendrocytes - evolutionarily newer cell types that develop postnatally in humans. These findings reveal a key missing link in the evolution of vertebrate brain cell development and illustrate a genetic basis for how some neural cell types are more susceptible to chromatin disruptions, with implications for neurodevelopmental disorders and disease

    Genome-Wide Association Study and Gene Expression Analysis Identifies CD84 as a Predictor of Response to Etanercept Therapy in Rheumatoid Arthritis

    Get PDF
    Anti-tumor necrosis factor alpha (anti-TNF) biologic therapy is a widely used treatment for rheumatoid arthritis (RA). It is unknown why some RA patients fail to respond adequately to anti-TNF therapy, which limits the development of clinical biomarkers to predict response or new drugs to target refractory cases. To understand the biological basis of response to anti-TNF therapy, we conducted a genome-wide association study (GWAS) meta-analysis of more than 2 million common variants in 2,706 RA patients from 13 different collections. Patients were treated with one of three anti-TNF medications: etanercept (n = 733), infliximab (n = 894), or adalimumab (n = 1,071). We identified a SNP (rs6427528) at the 1q23 locus that was associated with change in disease activity score (ΔDAS) in the etanercept subset of patients (P = 8×10-8), but not in the infliximab or adalimumab subsets (P>0.05). The SNP is predicted to disrupt transcription factor binding site motifs in the 3′ UTR of an immune-related gene, CD84, and the allele associated with better response to etanercept was associated with higher CD84 gene expression in peripheral blood mononuclear cells (P = 1×10-11 in 228 non-RA patients and P = 0.004 in 132 RA patients). Consistent with the genetic findings, higher CD84 gene expression correlated with lower cross-sectional DAS (P = 0.02, n = 210) and showed a non-significant trend for better ΔDAS in a subset of RA patients with gene expression data (n = 31, etanercept-treated). A small, multi-ethnic replication showed a non-significant trend towards an association among etanercept-treated RA patients of Portuguese ancestry (n = 139, P = 0.4), but no association among patients of Japanese ancestry (n = 151, P = 0.8). Our study demonstrates that an allele associated with response to etanercept therapy is also associated with CD84 gene expression, and further that CD84 expression correlates with disease activity. These findings support a model in which CD84 genotypes and/or expression may serve as a useful biomarker for response to etanercept treatment in RA patients of European ancestry. © 2013 Cui et al

    Studying Disease Dynamics with Manifold Learning

    No full text
    In an effort to computationally predict causal genetic networks and derive biological pathways empirically, biomedical scientists have begun to design increasingly complicated single cell experiments. A single study can now sequence millions of cells from hundreds of patient samples across different chronological timepoints or stages of disease progression with multiple measurement modalities. None of these additional complexities, however, are adequately addressed by the current state-of-the-art computational tools. Current machine learning techniques ignore these valuable sources of information by either downsampling the number of cells to a computationally tractable number or discarding experimental information, like stage of disease, modality or timepoint, in order to perform associative analyses. In this thesis, I will describe five manifold learning algorithms that will address each of these shortcomings in an attempt to move single cell machine learning research towards identifying causal mechanisms that underlie disease pathogenesis. I first describe a general framework called diffusion condensation, which uses a cascade of diffusion filters to learn hierarchy from a high dimensional dataset. Next, I describe Cellular Analysis of Topology and Condensation Homology, an extension of diffusion condensation that applies a cascade of manifold-intrinsic diffusion filters to single cells to learn cellular clusters across granularities, identify pathogenic populations and perform rapid differential gene expression analysis. With this approach, I identified an IL1B signaling axis between microglia and astrocytes which we show drives disease progression in age-related macular degeneration. I further extend the diffusion condensation framework to visualize cellular hierarchy by integrating diffusion condensation with potential distance theoretic in Multicale PHATE. By analyzing 54 million cells from 163 patients infected with SARS-CoV-2, Multiscale PHATE identified celltypes and cellular subsets directly predictive of patient mortality. In an effort to integrate information from multimodal data, I present integrated diffusion, a novel framework for integrating multimodal single cell data and perform downstream analysis tasks like visualization and data denoising for identifying epigenetic-genetic interactions and networks. Finally, I present TrajectoryNet a novel trajectory inference tool that creates continuous trajectories from timelapsed single cell measurements. I leverage this approach to identify the transcriptional program responsible for driving metastasis in an in vitro model of mesenchymal-to-epithelial transition. Using TrajectoryNet, I identified ESRRA as a genetic switch that promotes differentiation to epithelial cell state and metastasis. Together these approaches integrate experimental information with complex single cell datasets to infer biological mechanisms driving disease pathogenesis, helping move the computational biology field away from associative research and towards causality

    Common risk alleles for inflammatory diseases are targets of recent positive selection

    Get PDF
    Genome-wide association studies (GWASs) have identified hundreds of loci harboring genetic variation influencing inflammatory-disease susceptibility in humans. It has been hypothesized that present day inflammatory diseases may have arisen, in part, due to pleiotropic effects of host resistance to pathogens over the course of human history, with significant selective pressures acting to increase host resistance to pathogens. The extent to which genetic factors underlying inflammatory-disease susceptibility has been influenced by selective processes can now be quantified more comprehensively than previously possible. To understand the evolutionary forces that have shaped inflammatory-disease susceptibility and to elucidate functional pathways affected by selection, we performed a systems-based analysis to integrate (1) published GWASs for inflammatory diseases, (2) a genome-wide scan for signatures of positive selection in a population of European ancestry, (3) functional genomics data comprised of protein-protein interaction networks, and (4) a genome-wide expression quantitative trait locus (eQTL) mapping study in peripheral blood mononuclear cells (PBMCs). We demonstrate that loci for inflammatory-disease susceptibility are enriched for genomic signatures of recent positive natural selection, with selected loci forming a highly interconnected protein-protein interaction network. Further, we identify 21 loci for inflammatory-disease susceptibility that display signatures of recent positive selection, of which 13 also show evidence of cis-regulatory effects on genes within the associated locus. Thus, our integrated analyses highlight a set of susceptibility loci that might subserve a shared molecular function and has experienced selective pressure over the course of human history; today, these loci play a key role in influencing susceptibility to multiple different inflammatory diseases, in part through alterations of gene expression in immune cells

    Immune cells and their inflammatory mediators modify β cells and cause checkpoint inhibitor-induced diabetes.

    No full text
    Checkpoint inhibitors (CPIs) targeting programmed death 1 (PD-1)/programmed death ligand 1 (PD-L1) and cytotoxic T lymphocyte antigen 4 (CTLA-4) have revolutionized cancer treatment but can trigger autoimmune complications, including CPI-induced diabetes mellitus (CPI-DM), which occurs preferentially with PD-1 blockade. We found evidence of pancreatic inflammation in patients with CPI-DM with shrinkage of pancreases, increased pancreatic enzymes, and in a case from a patient who died with CPI-DM, peri-islet lymphocytic infiltration. In the NOD mouse model, anti-PD-L1 but not anti-CTLA-4 induced diabetes rapidly. RNA sequencing revealed that cytolytic IFN-γ+CD8+ T cells infiltrated islets with anti-PD-L1. Changes in β cells were predominantly driven by IFN-γ and TNF-α and included induction of a potentially novel β cell population with transcriptional changes suggesting dedifferentiation. IFN-γ increased checkpoint ligand expression and activated apoptosis pathways in human β cells in vitro. Treatment with anti-IFN-γ and anti-TNF-α prevented CPI-DM in anti-PD-L1-treated NOD mice. CPIs targeting the PD-1/PD-L1 pathway resulted in transcriptional changes in β cells and immune infiltrates that may lead to the development of diabetes. Inhibition of inflammatory cytokines can prevent CPI-DM, suggesting a strategy for clinical application to prevent this complication

    Single-cell analysis reveals inflammatory interactions driving macular degeneration

    No full text
    Abstract Due to commonalities in pathophysiology, age-related macular degeneration (AMD) represents a uniquely accessible model to investigate therapies for neurodegenerative diseases, leading us to examine whether pathways of disease progression are shared across neurodegenerative conditions. Here we use single-nucleus RNA sequencing to profile lesions from 11 postmortem human retinas with age-related macular degeneration and 6 control retinas with no history of retinal disease. We create a machine-learning pipeline based on recent advances in data geometry and topology and identify activated glial populations enriched in the early phase of disease. Examining single-cell data from Alzheimer’s disease and progressive multiple sclerosis with our pipeline, we find a similar glial activation profile enriched in the early phase of these neurodegenerative diseases. In late-stage age-related macular degeneration, we identify a microglia-to-astrocyte signaling axis mediated by interleukin-1β which drives angiogenesis characteristic of disease pathogenesis. We validated this mechanism using in vitro and in vivo assays in mouse, identifying a possible new therapeutic target for AMD and possibly other neurodegenerative conditions. Thus, due to shared glial states, the retina provides a potential system for investigating therapeutic approaches in neurodegenerative diseases
    corecore