47 research outputs found

    Weighted graph matching approaches to structure comparison and alignment and their application to biological problems

    Get PDF
    In pattern recognition and machine learning, comparing and contrasting are the most fundamental operations: from similarities we derive common rules encoded in the systems, while from difference we infer what makes each system unique. The biological sciences are not an exception to these operations and, in fact, rely heavily on their use. More recently, the emergence of high-throughput measurement technologies has highlighted the need for novel approaches capable of enhancing our ability to understand complex relationships in these data sets. Often, these relationships can be best represented using graphs (or networks), where nodes are biochemical components such as genes, RNAs, proteins or metabolites, and edges indicate the types (and often quality) of relationship. Comparison of relationships is generally performed by aligning the networks of interest. For example, for protein-protein interaction (PPI) networks, the goal of network alignment is to find mappings between nodes (proteins) which are highly useful in identifying signaling pathways or protein complexes and to annotate genes of unknown functionality from subnetworks conserved across multiple species. Phylogenetic trees are also graph structures that describe evolutionary relationship among groups of organisms and their hypothetical ancestors. As it has been shown in a large volume of previous work, comparison of trees also opens the possibility of supporting or building new evolutionary hypotheses: for example, the detection of host-parasite symbiosis, gene coevolution as a signal of physical interactions among genes, or nonstandard events such as horizontal gene transfer. The goal of this thesis is to develop and implement a flexible set of algorithms and methodologies that can be used for the alignment of trees and/or networks having various sizes and properties. We first define a new relaxed model of graph isomorphism in which the shortest path lengths are preserved between corresponding intra-node pairs. Then, based on Google's PageRank model, we present a new tree matching approach, phyloAligner, which resolves several weakness of previous approaches. We further generalize this tree matching algorithm to a broader flexible framework, MCS-Finder, as a scalable and error-tolerant approximation for identifying the maximum common substructure between weighted graphs or distance matrices of different sizes. For phylogenetic trees with weighted edges and strictly-labeled nodes, multidimensional scaling-based methods, xCEED, can effectively evaluate the structural similarity and identify which regions are congruent/incongruent. These methods successfully detected coevolutionary signals as well as nonstandard evolutionary events such as horizontal gene transfer, and recovered interaction specificity between multigene families

    A Bayesian mixture model for the analysis of allelic expression in single cells.

    Get PDF
    Allele-specific expression (ASE) at single-cell resolution is a critical tool for understanding the stochastic and dynamic features of gene expression. However, low read coverage and high biological variability present challenges for analyzing ASE. We demonstrate that discarding multi-mapping reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. Here, we report a method for ASE analysis from single-cell RNA-Seq data that accurately classifies allelic expression states and improves estimation of allelic proportions by pooling information across cells. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to re-evaluate the statistical independence of allelic bursting and track changes in the allele-specific expression patterns of cells sampled over a developmental time course

    Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics.

    Get PDF
    BACKGROUND: Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies argue that there is no zero inflation in scRNA-seq data. RESULTS: We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. CONCLUSIONS: Despite the existence of zero inflation in scRNA-seq counts, we recommend the generalized linear model with negative binomial count distribution, not zero-inflated, as a suitable reference model for scRNA-seq analysis

    Natural genetic variation determines microglia heterogeneity in wild-derived mouse models of Alzheimer\u27s disease.

    Get PDF
    Genetic and genome-wide association studies suggest a central role for microglia in Alzheimer\u27s disease (AD). However, single-cell RNA sequencing (scRNA-seq) of microglia in mice, a key preclinical model, has shown mixed results regarding translatability to human studies. To address this, scRNA-seq of microglia from C57BL/6J (B6) and wild-derived strains (WSB/EiJ, CAST/EiJ, and PWK/PhJ) with and without APP/PS1 demonstrates that genetic diversity significantly alters features and dynamics of microglia in baseline neuroimmune functions and in response to amyloidosis. Results show significant variation in the abundance of microglial subtypes or states, including numbers of previously identified disease-associated and interferon-responding microglia, across the strains. For each subtype, significant differences in the expression of many genes are observed in wild-derived strains relative to B6, including 19 genes previously associated with human AD including Apoe, Trem2, and Sorl1. This resource is critical in the development of appropriately targeted therapeutics for AD and other neurological diseases

    SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty.

    Get PDF
    Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time

    Comparison of phylogenetic trees through alignment of embedded evolutionary distances

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The understanding of evolutionary relationships is a fundamental aspect of modern biology, with the phylogenetic tree being a primary tool for describing these associations. However, comparison of trees for the purpose of assessing similarity and the quantification of various biological processes remains a significant challenge.</p> <p>Results</p> <p>We describe a novel approach for the comparison of phylogenetic distance information based on the alignment of representative high-dimensional embeddings (xCEED: Comparison of Embedded Evolutionary Distances). The xCEED methodology, which utilizes multidimensional scaling and Procrustes-related superimposition approaches, provides the ability to measure the global similarity between trees as well as incongruities between them. We demonstrate the application of this approach to the prediction of coevolving protein interactions and demonstrate its improved performance over the mirrortree, tol-mirrortree, phylogenetic vector projection, and partial correlation approaches. Furthermore, we show its applicability to both the detection of horizontal gene transfer events as well as its potential use in the prediction of interaction specificity between a pair of multigene families.</p> <p>Conclusions</p> <p>These approaches provide additional tools for the study of phylogenetic trees and associated evolutionary processes. Source code is available at <url>http://gomezlab.bme.unc.edu/tools</url>.</p

    churchill-lab/gbrs: v0.1.5

    No full text
    Stable versio

    Publisher Correction: Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics

    Full text link
    An amendment to this paper has been published and can be accessed via the original article.http://deepblue.lib.umich.edu/bitstream/2027.42/173858/1/13059_2020_Article_2182.pd
    corecore