224 research outputs found
The Escherichia coli transcriptome mostly consists of independently regulated modules
Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome
Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis
<p>Abstract</p> <p>Background</p> <p>Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis.</p> <p>Results</p> <p>Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets.</p> <p>Conclusion</p> <p>Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.</p
Genome-wide enhancer maps link risk variants to disease genes
Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complextraits, each of which could reveal insights into the mechanisms of disease(1). Many ofthe underlying causal variants may affect enhancers(2,3), but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types(4). Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577genesthat appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.Peer reviewe
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome
Additional file 3. This file contains all supplementary tables relating to lncRNA identification via the conservation of synteny. Table S3. lncRNAs inferred in one species by the genomic alignment of a transcript assembled with the RNA-seq libraries from a related spdecies. Table S12. Presence of intergenic lncRNAs both in sheep and cattle, in regions of conserved synteny. Table S13. Presence of intergenic lncRNAs both in sheep and goat, in regions of conserved synteny. Table S14. Presence of intergenic lncRNAs both in cattle and goat, in regions of conserved synteny. Table S15. Presence of intergenic lncRNAs both in sheep and humans, in regions of conserved synteny. Table S16. Presence of intergenic lncRNAs both in goat and humans, in regions of conserved synteny. Table S17. Presence of intergenic lncRNAs both in cattle and humans, in regions of conserved synteny. Table S18. High-confidence lncRNA pairs, those conserved across species both sequentially and positionally
Long non-coding RNAs: spatial amplifiers that control nuclear structure and gene expression
Over the past decade, it has become clear that mammalian genomes encode thousands of long non-coding RNAs (lncRNAs), many of which are now implicated in diverse biological processes. Recent work studying the molecular mechanisms of several key examples — including Xist, which orchestrates X chromosome inactivation — has provided new insights into how lncRNAs can control cellular functions by acting in the nucleus. Here we discuss emerging mechanistic insights into how lncRNAs can regulate gene expression by coordinating regulatory proteins, localizing to target loci and shaping three-dimensional (3D) nuclear organization. We explore these principles to highlight biological challenges in gene regulation, in which lncRNAs are well-suited to perform roles that cannot be carried out by DNA elements or protein regulators alone, such as acting as spatial amplifiers of regulatory signals in the nucleus
m^6A RNA methylation promotes XIST-mediated transcriptional repression
The long non-coding RNA X-inactive specific transcript (XIST) mediates the transcriptional silencing of genes on the X chromosome. Here we show that, in human cells, XIST is highly methylated with at least 78 N^6-methyladenosine (m^6A) residues—a reversible base modification of unknown function in long non-coding RNAs. We show that m^6A formation in XIST, as well as in cellular mRNAs, is mediated by RNA-binding motif protein 15 (RBM15) and its paralogue RBM15B, which bind the m^6A-methylation complex and recruit it to specific sites in RNA. This results in the methylation of adenosine nucleotides in adjacent m^6A consensus motifs. Furthermore, we show that knockdown of RBM15 and RBM15B, or knockdown of methyltransferase like 3 (METTL3), an m^6A methyltransferase, impairs XIST-mediated gene silencing. A systematic comparison of m^6A-binding proteins shows that YTH domain containing 1 (YTHDC1) preferentially recognizes m^6A residues on XIST and is required for XIST function. Additionally, artificial tethering of YTHDC1 to XIST rescues XIST-mediated silencing upon loss of m^6A. These data reveal a pathway of m^6A formation and recognition required for XIST-mediated transcriptional repression
Xist localization and function: new insights from multiple levels
In female m ammals, one of the two X chromosomes in each cell is transcriptionally silenced in order to achieve dosage compensation between the genders in a process called X chromosome inactivation. The master regulator of this process is the long non-coding RNA Xist. During X-inactivation, Xist accumulates in cis on the future inactive X chromosome, triggering a cascade of events that provoke the stable silencing of the entire chromosome, with relatively few genes remaining active. How Xist spreads, what are its binding sites, how it recruits silencing factors and how it induces a specific topological and nuclear organization of the chromatin all remain largely unanswered questions. Recent studies have improved our understanding of Xist localization and the proteins with which it interacts, allowing a reappraisal of ideas about Xist function. We discuss recent advances in our knowledge of Xist-mediated silencing, focusing on Xist spreading, the nuclear organization of the inactive X chromosome, recruitment of the polycomb complex and the role of the nuclear matrix in the process of X chromosome inactivation
Chromatin loop anchors are associated with genome instability in cancer and recombination hotspots in the germline
Abstract Background Chromatin loops form a basic unit of interphase nuclear organization, with chromatin loop anchor points providing contacts between regulatory regions and promoters. However, the mutational landscape at these anchor points remains under-studied. Here, we describe the unusual patterns of somatic mutations and germline variation associated with loop anchor points and explore the underlying features influencing these patterns. Results Analyses of whole genome sequencing datasets reveal that anchor points are strongly depleted for single nucleotide variants (SNVs) in tumours. Despite low SNV rates in their genomic neighbourhood, anchor points emerge as sites of evolutionary innovation, showing enrichment for structural variant (SV) breakpoints and a peak of SNVs at focal CTCF sites within the anchor points. Both CTCF-bound and non-CTCF anchor points harbour an excess of SV breakpoints in multiple tumour types and are prone to double-strand breaks in cell lines. Common fragile sites, which are hotspots for genome instability, also show elevated numbers of intersecting loop anchor points. Recurrently disrupted anchor points are enriched for genes with functions in cell cycle transitions and regions associated with predisposition to cancer. We also discover a novel class of CTCF-bound anchor points which overlap meiotic recombination hotspots and are enriched for the core PRDM9 binding motif, suggesting that the anchor points have been foci for diversity generated during recent human evolution. Conclusions We suggest that the unusual chromatin environment at loop anchor points underlies the elevated rates of variation observed, marking them as sites of regulatory importance but also genomic fragility
- …