307 research outputs found
Transcriptome content and dynamics at single-nucleotide resolution
Massively parallel short-tag 'RNA sequencing' is giving new insights into the complexity of the eukaryotic transcriptome
The uniqueome: a mappability resource for short-tag sequencing
Summary: Quantification applications of short-tag sequencing data (such as CNVseq and RNAseq) depend on knowing the uniqueness of specific genomic regions at a given threshold of error. Here, we present the βuniqueomeβ, a genomic resource for understanding the uniquely mappable proportion of genomic sequences. Pre-computed data are available for human, mouse, fly and worm genomes in both color-space and nucletotide-space, and we demonstrate the utility of this resource as applied to the quantification of RNAseq data
Recommended from our members
Integrated Genome Analysis Suggests that Most Conserved Non-Coding Sequences are Regulatory Factor Binding Sites
More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements
Dynamic transcription programs during ES cell differentiation towards mesoderm in serum versus serum-freeBMP4 culture
Expression profiling of embryonic stem (ES) cell differentiation in the presence of serum has been performed previously. It remains unclear if transcriptional activation is dependent on complext growth factor mixtures in serum or whether this process is intrinsic to ES cells once the stem cell program has been inactivated. The aims of this study were to determine the transcriptional programs associated with the stem cell state and to characterize mesoderm differentiation between serum and serum-free culture
Exome-wide association study of pancreatic cancer risk
We conducted a case-control exome-wide association study to discover germline variants in coding regions that affect risk for pancreatic cancer, combining data from 5 studies. We analyzed exome and genome sequencing data from 437 patients with pancreatic cancer (cases) and 1922 individuals not known to have cancer (controls). In the primary analysis, BRCA2 had the strongest enrichment for rare inactivating variants (17/437 cases vs 3/1922 controls) (P=3.27x10(-6); exome-wide statistical significance threshold P<2.5x10(-6)). Cases had more rare inactivating variants in DNA repair genes than controls, even after excluding 13 genes known to predispose to pancreatic cancer (adjusted odds ratio, 1.35, P=.045). At the suggestive threshold (P<.001), 6 genes were enriched for rare damaging variants (UHMK1, AP1G2, DNTA, CHST6, FGFR3, and EPHA1) and 7 genes had associations with pancreatic cancer risk, based on the sequence-kernel association test. We confirmed variants in BRCA2 as the most common high-penetrant genetic factor associated with pancreatic cancer and we also identified candidate pancreatic cancer genes. Large collaborations and novel approaches are needed to overcome the genetic heterogeneity of pancreatic cancer predisposition
The uniqueome: a mappability resource for short-tag sequencing
Summary: Quantification applications of short-tag sequencing data (such as CNVseq and RNAseq) depend on knowing the uniqueness of specific genomic regions at a given threshold of error. Here, we present the βuniqueomeβ, a genomic resource for understanding the uniquely mappable proportion of genomic sequences. Pre-computed data are available for human, mouse, fly and worm genomes in both color-space and nucletotide-space, and we demonstrate the utility of this resource as applied to the quantification of RNAseq data
X-MATE: a flexible system for mapping short read data
Summary: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software
Identification of Anchor Genes during Kidney Development Defines Ontological Relationships, Molecular Subcompartments and Regulatory Pathways
The development of the mammalian kidney is well conserved from mouse to man. Despite considerable temporal and spatial data on gene expression in mammalian kidney development, primarily in rodent species, there is a paucity of genes whose expression is absolutely specific to a given anatomical compartment and/or developmental stage, defined here as βanchorβ genes. We previously generated an atlas of gene expression in the developing mouse kidney using microarray analysis of anatomical compartments collected via laser capture microdissection. Here, this data is further analysed to identify anchor genes via stringent bioinformatic filtering followed by high resolution section in situ hybridisation performed on 200 transcripts selected as specific to one of 11 anatomical compartments within the midgestation mouse kidney. A total of 37 anchor genes were identified across 6 compartments with the early proximal tubule being the compartment richest in anchor genes. Analysis of minimal and evolutionarily conserved promoter regions of this set of 25 anchor genes identified enrichment of transcription factor binding sites for Hnf4a and Hnf1b, RbpJ (Notch signalling), PPARΞ³:RxRA and COUP-TF family transcription factors. This was reinforced by GO analyses which also identified these anchor genes as targets in processes including epithelial proliferation and proximal tubular function. As well as defining anchor genes, this large scale validation of gene expression identified a further 92 compartment-enriched genes able to subcompartmentalise key processes during murine renal organogenesis spatially or ontologically. This included a cohort of 13 ureteric epithelial genes revealing previously unappreciated compartmentalisation of the collecting duct system and a series of early tubule genes suggesting that segmentation into proximal tubule, loop of Henle and distal tubule does not occur until the onset of glomerular vascularisation. Overall, this study serves to illuminate previously ill-defined stages of patterning and will enable further refinement of the lineage relationships within mammalian kidney development
The miR-17-5p microRNA is a key regulator of the G1/S phase cell cycle transition
Novel targets of the oncogenic miR-17-92 cluster have been identified and the mechanism of regulation of proliferation at the G1/S phase cell cycle transition via the miR-17-5p microRNA has been elucidated
Phasevarion Mediated Epigenetic Gene Regulation in Helicobacter pylori
Many host-adapted bacterial pathogens contain DNA methyltransferases (mod genes) that are subject to phase-variable expression (high-frequency reversible ON/OFF switching of gene expression). In Haemophilus influenzae and pathogenic Neisseria, the random switching of the modA gene, associated with a phase-variable type III restriction modification (R-M) system, controls expression of a phase-variable regulon of genes (a βphasevarionβ), via differential methylation of the genome in the modA ON and OFF states. Phase-variable type III R-M systems are also found in Helicobacter pylori, suggesting that phasevarions may also exist in this key human pathogen. Phylogenetic studies on the phase-variable type III modH gene revealed that there are 17 distinct alleles in H. pylori, which differ only in their DNA recognition domain. One of the most commonly found alleles was modH5 (16% of isolates). Microarray analysis comparing the wild-type P12modH5 ON strain to a P12ΞmodH5 mutant revealed that six genes were either up- or down-regulated, and some were virulence-associated. These included flaA, which encodes a flagella protein important in motility and hopG, an outer membrane protein essential for colonization and associated with gastric cancer. This study provides the first evidence of this epigenetic mechanism of gene expression in H. pylori. Characterisation of H. pylori modH phasevarions to define stable immunological targets will be essential for vaccine development and may also contribute to understanding H. pylori pathogenesis
- β¦