61 research outputs found

    Characterization of Transcription Start Sites of Putative Non-coding RNAs by Multifaceted Use of Massively Paralleled Sequencer

    Get PDF
    On the basis of integrated transcriptome analysis, we show that not all transcriptional start site clusters (TSCs) in the intergenic regions (iTSCs) have the same properties; thus, it is possible to discriminate the iTSCs that are likely to have biological relevance from the other noise-level iTSCs. We used a total of 251 933 381 short-read sequence tags generated from various types of transcriptome analyses in order to characterize 6039 iTSCs, which have significant expression levels. We analyzed and found that 23% of these iTSCs were located in the proximal regions of the RefSeq genes. These RefSeq-linked iTSCs showed similar expression patterns with the neighboring RefSeq genes, had widely fluctuating transcription start sites and lacked ordered nucleosome positioning. These iTSCs seemed not to form independent transcriptional units, simply representing the by-products of the neighboring RefSeq genes, in spite of their significant expression levels. Similar features were also observed for the TSCs located in the antisense regions of the RefSeq genes. Furthermore, for the remaining iTSCs that were not associated with any RefSeq genes, we demonstrate that integrative interpretation of the transcriptome data provides essential information to specify their biological functions in the hypoxic responses of the cells

    Identification of Novel Genes Selectively Expressed in the Follicle-Associated Epithelium from the Meta-Analysis of Transcriptomics Data from Multiple Mouse Cell and Tissue Populations

    Get PDF
    The follicle-associated epithelium (FAE) overlying the Peyer’s patches and the microfold cells (M cells) within it are important sites of antigen transcytosis across the intestinal epithelium. Using a meta-analysis approach, we identified a transcriptional signature that distinguished the FAE from a large collection of mouse cells and tissues. A co-expressed cluster of 21 FAE-specific genes was identified, and the analysis of the transcription factor binding site motifs in their promoter regions indicated that these genes shared an underlying transcriptional programme. This cluster contained known FAE- (Anxa10, Ccl20, Psg18 and Ubd) and M-cell-specific (Gp2) genes, suggesting that the others were novel FAE-specific genes. Some of these novel candidate genes were expressed highly by the FAE and M cells (Calcb, Ces3b, Clca2 and Gjb2), and others only by the FAE (Ascl2, Cftr, Fgf15, Gpr133, Kcna1, Kcnj15,Mycl1, Pgap1 and Rps6kl). We also identified a subset of novel FAE-related genes that were induced in the intestinal epithelium after receptor activatorof nuclear factor (NF)-kB ligand stimulation. These includedMfge8whichwas specific to FAE enter-ocytes. This studyprovides new insight into the FAE transcriptome. Furthercharacterizationof the candidate genes identified here will aid the identification of novel regulators of cell function in the FAE

    Long homopurine•homopyrimidine sequences are characteristic of genes expressed in brain and the pseudoautosomal region

    Get PDF
    Homo(purine•pyrimidine) sequences (R•Y tracts) with mirror repeat symmetries form stable triplexes that block replication and transcription and promote genetic rearrangements. A systematic search was conducted to map the location of the longest R•Y tracts in the human genome in order to assess their potential function(s). The 814 R•Y tracts with ≥250 uninterrupted base pairs were preferentially clustered in the pseudoautosomal region of the sex chromosomes and located in the introns of 228 annotated genes whose protein products were associated with functions at the cell membrane. These genes were highly expressed in the brain and particularly in genes associated with susceptibility to mental disorders, such as schizophrenia. The set of 1957 genes harboring the 2886 R•Y tracts with ≥100 uninterrupted base pairs was additionally enriched in proteins associated with phosphorylation, signal transduction, development and morphogenesis. Comparisons of the ≥250 bp R•Y tracts in the mouse and chimpanzee genomes indicated that these sequences have mutated faster than the surrounding regions and are longer in humans than in chimpanzees. These results support a role for long R•Y tracts in promoting recombination and genome diversity during evolution through destabilization of chromosomal DNA, thereby inducing repair and mutation

    lincRNAs act in the circuitry controlling pluripotency and differentiation

    Get PDF
    Although thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ES cell regulators. Notably, lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs. We integrate lincRNAs into the molecular circuitry of ES cells and show that lincRNA genes are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to affect shared gene expression programs. Together, the results demonstrate that lincRNAs have key roles in the circuitry controlling ES cell state.Broad InstituteHarvard UniversityNational Human Genome Research Institute (U.S.)Merkin Family Foundation for Stem Cell Researc

    Gene and genon concept: coding versus regulation: A conceptual and information-theoretic analysis of genetic storage and expression in the light of modern molecular biology

    Get PDF
    We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism
    corecore