58 research outputs found

    Genome-wide investigation of in vivo EGR-1 binding sites in monocytic differentiation

    Get PDF
    A Genome-wide analysis of EGR-1 binding sites reveals co-localization with CpG islands and histone H3 lysine 9 binding. SP-1 binding occupancies near EGR-1 binding sites are dramatically altered

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    The genome sequence of Streptomyces rochei 7434AN4, which carries a linear chromosome and three characteristic linear plasmids

    Get PDF
    Streptomyces rochei 7434AN4 produces two structurally unrelated polyketide antibiotics, lankacidin and lankamycin, and carries three linear plasmids, pSLA2-L (211 kb), -M (113 kb), and -S (18 kb), whose nucleotide sequences were previously reported. The complete nucleotide sequence of the S. rochei chromosome has now been determined using the long-read PacBio RS-II sequencing together with short-read Illumina Genome Analyzer IIx sequencing and Roche 454 pyrosequencing techniques. The assembled sequence revealed an 8,364,802-bp linear chromosome with a high G + C content of 71.7% and 7,568 protein-coding ORFs. Thus, the gross genome size of S. rochei 7434AN4 was confirmed to be 8,706,406 bp including the three linear plasmids. Consistent with our previous study, a tap-tpg gene pair, which is essential for the maintenance of a linear topology of Streptomyces genomes, was not found on the chromosome. Remarkably, the S. rochei chromosome contains seven ribosomal RNA (rrn) operons (16S-23S-5S), although Streptomyces species generally contain six rrn operons. Based on 2ndFind and antiSMASH platforms, the S. rochei chromosome harbors at least 35 secondary metabolite biosynthetic gene clusters, including those for the 28-membered polyene macrolide pentamycin and the azoxyalkene compound KA57-A.This work was supported by Grants-in-Aid for Scientific Research on Innovative Areas (23108515, 25108718 and 17H05446 to K.A.) from Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT), Grants-in-Aid for Scientific Research (B) (16H04917 to K.A.) from the Japan Society for the Promotion of Science (JSPS), and the Sasakawa Scientific Research Grant from the Japan Science Society to Y.N. This work was partly supported by a JSPS A3 Foresight Program. A.A.F. and R.M. were supported by the Indonesia Endowment Fund for Education (LPDP). Sequencing analysis using an Illumina GAIIx sequencer was supported by the Grant in Aid for Scientific Research on Innovative Areas (22108010 to J.I.) from MEXT

    Gene Organization in Rice Revealed by Full-Length cDNA Mapping and Gene Expression Analysis through Microarray

    Get PDF
    Rice (Oryza sativa L.) is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA) sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE) genes, 33K annotated non-expressed (ANE) genes, and 5.5K non-annotated expressed (NAE) genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria

    Direct Metagenomic Detection of Viral Pathogens in Nasal and Fecal Specimens Using an Unbiased High-Throughput Sequencing Approach

    Get PDF
    With the severe acute respiratory syndrome epidemic of 2003 and renewed attention on avian influenza viral pandemics, new surveillance systems are needed for the earlier detection of emerging infectious diseases. We applied a “next-generation” parallel sequencing platform for viral detection in nasopharyngeal and fecal samples collected during seasonal influenza virus (Flu) infections and norovirus outbreaks from 2005 to 2007 in Osaka, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.1–0.25 ml of nasopharyngeal aspirates (N = 3) and fecal specimens (N = 5), and more than 10 µg of cDNA was synthesized. Unbiased high-throughput sequencing of these 8 samples yielded 15,298–32,335 (average 24,738) reads in a single 7.5 h run. In nasopharyngeal samples, although whole genome analysis was not available because the majority (>90%) of reads were host genome–derived, 20–460 Flu-reads were detected, which was sufficient for subtype identification. In fecal samples, bacteria and host cells were removed by centrifugation, resulting in gain of 484–15,260 reads of norovirus sequence (78–98% of the whole genome was covered), except for one specimen that was under-detectable by RT-PCR. These results suggest that our unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. Although its cost and technological availability make it unlikely that this system will very soon be the diagnostic standard worldwide, this system could be useful for the earlier discovery of novel emerging viruses and bioterrorism, which are difficult to detect with conventional procedures

    Functional annotation of human long noncoding RNAs via molecular phenotyping

    Get PDF
    Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-todate lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.Peer reviewe

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism
    corecore