45 research outputs found
A Bayesian mixture model for the analysis of allelic expression in single cells.
Allele-specific expression (ASE) at single-cell resolution is a critical tool for understanding the stochastic and dynamic features of gene expression. However, low read coverage and high biological variability present challenges for analyzing ASE. We demonstrate that discarding multi-mapping reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. Here, we report a method for ASE analysis from single-cell RNA-Seq data that accurately classifies allelic expression states and improves estimation of allelic proportions by pooling information across cells. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to re-evaluate the statistical independence of allelic bursting and track changes in the allele-specific expression patterns of cells sampled over a developmental time course
Trappc9 deficiency causes parent-of-origin dependent microcephaly and obesity
Some imprinted genes exhibit parental origin specific expression bias rather than being transcribed exclusively from one copy. The physiological relevance of this remains poorly understood. In an analysis of brain-specific allele-biased expression, we identified that Trappc9, a cellular trafficking factor, was expressed predominantly (~70%) from the maternally inherited allele. Loss-of-function mutations in human TRAPPC9 cause a rare neurodevelopmental syndrome characterized by microcephaly and obesity. By studying Trappc9 null mice we discovered that homozygous mutant mice showed a reduction in brain size, exploratory activity and social memory, as well as a marked increase in body weight. A role for Trappc9 in energy balance was further supported by increased ad libitum food intake in a child with TRAPPC9 deficiency. Strikingly, heterozygous mice lacking the maternal allele (70% reduced expression) had pathology similar to homozygous mutants, whereas mice lacking the paternal allele (30% reduction) were phenotypically normal. Taken together, we conclude that Trappc9 deficient mice recapitulate key pathological features of TRAPPC9 mutations in humans and identify a role for Trappc9 and its imprinting in controlling brain development and metabolism
Variation in histone configurations correlates with gene expression across nine inbred strains of mice.
The diversity outbred (DO) mice and their inbred founders are widely used models of human disease. However, although the genetic diversity of these mice has been well documented, their epigenetic diversity has not. Epigenetic modifications, such as histone modifications and DNA methylation, are important regulators of gene expression, and as such are a critical mechanistic link between genotype and phenotype. Therefore, creating a map of epigenetic modifications in the DO mice and their founders is an important step toward understanding mechanisms of gene regulation and the link to disease in this widely used resource. To this end, we performed a strain survey of epigenetic modifications in hepatocytes of the DO founders. We surveyed four histone modifications (H3K4me1, H3K4me3, H3K27me3, and H3K27ac), and DNA methylation. We used ChromHMM to identify 14 chromatin states, each of which represented a distinct combination of the four histone modifications. We found that the epigenetic landscape was highly variable across the DO founders and was associated with variation in gene expression across strains. We found that epigenetic state imputed into a population of DO mice recapitulated the association with gene expression seen in the founders suggesting that both histone modifications and DNA methylation are highly heritable mechanisms of gene expression regulation. We illustrate how DO gene expression can be aligned with inbred epigenetic states to identify putative cis-regulatory regions. Finally, we provide a data resource that documents strain-specific variation in chromatin state and DNA methylation in hepatocytes across nine widely used strains of laboratory mice
The Genome of C57BL/6J Eve , the Mother of the Laboratory Mouse Genome Reference Strain.
Isogenic laboratory mouse strains enhance reproducibility because individual animals are genetically identical. For the most widely used isogenic strain, C57BL/6, there exists a wealth of genetic, phenotypic, and genomic data, including a high-quality reference genome (GRCm38.p6). Now 20 years after the first release of the mouse reference genome, C57BL/6J mice are at least 26 inbreeding generations removed from GRCm38 and the strain is now maintained with periodic reintroduction of cryorecovered mice derived from a single breeder pair, aptly named Adam and Eve. To provide an update to the mouse reference genome that more accurately represents the genome of today\u27s C57BL/6J mice, we took advantage of long read, short read, and optical mapping technologies to generate a de novo assembly of the C57BL/6J Eve genome (B6Eve). Using these data, we have addressed recurring variants observed in previous mouse genomic studies. We have also identified structural variations, closed gaps in the mouse reference assembly, and revealed previously unannotated coding sequences. This B6Eve assembly explains discrepant observations that have been associated with GRCm38-based analyses, and will inform a reference genome that is more representative of the C57BL/6J mice that are in use today
Individual Gene Cluster Statistics in Noisy Maps
Identification of homologous chromosomal regions is important for understanding evolutionary processes that shape genome evolution, such as genome rearrangements and large scale duplication events. If these chromosomal regions have diverged significantly, statistical tests to determine whether observed similarities in gene content are due to history or chance are imperative. Currently available methods are typically designed for genomic data and are appropriate for whole genome analyses. Statistical methods for estimating significance when a single pair of regions is under consideration are needed. We present a new statistical method, based on generating functions, for estimating the significance of orthologous gene clusters under the null hypothesis of random gene order. Our statistics is suitable for noisy comparative maps, in which a one-to-one homology mapping cannot be established. It is also designed for testing the significance of an individual gene cluster in isolation, in situations where whole genome data is not available. We implement our statistics in Mathematica and demonstrate its utility by applying it to the MHC homologous regions in human and fly.</p
TWO PLUS TWO DOES NOT EQUAL THREE: STATISTICAL TESTS FOR MULTIPLE GENOME COMPARISON
Gene clusters that span three or more chromosomal regions are of increasing importance, yet statistical tests to validate such clusters are in their infancy. Current approaches either conduct several pairwise comparisons, or consider only the number of genes that occur in all the regions. In this paper, we provide statistical tests for clusters spanning exactly three regions based on genome models of typical comparative genomics problems, including analysis of conserved linkage within multiple species and identification of large-scale duplications. Our tests are the first to combine evidence from genes shared among all three regions and genes shared between pairs of regions. We show that our tests of clusters spanning three regions are more sensitive than existing approaches and can thus be used to identify more diverged homologous regions. 1
Gene cluster statistics with gene families.
Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such "gene clusters" is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data analysis in genomes of various sizes and illustrate the utility of our methods by applying them to gene clusters recently reported in the literature. Mathematical code to compute cluster probabilities using our methods is available as supplementary material.</p
The Protean Programmable Network Architecture: Design and Initial Experience
This paper presents Protean, a programmable network architecture for the future Internet. Protean is an event-driven network architecture that allows service providers, applications, and even individual flows to customize the network services, while at the same time providing efficient data paths for flows that use default services. We believe that the Protean approach for injecting and customizing services in the network is both flexible and reasonably scalable. A key feature of Protean is the support for state management. A service that is invoked at one node has the ability to access and update non-local state, and the management of distributed network state is achieved by a core-based self-configuring infrastructure in Protean. Our initial experience has shown that it is fairly easy to write even moderately complex services in Protean because of the ability to manipulate non-local state at a switch. We present a simple case study of a Protean switch that illustrates the f..