6 research outputs found

    RNAcentral: A vision for an international database of RNA sequences

    Get PDF
    During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor

    Lokalisationsanalyse kardial exprimierter Gene des Menschen

    No full text
    Die vorliegende Arbeit befasst sich mit einer weitergehenden Untersuchung der chromosomalen Co-Lokalisation und Co-Expression herzexprimierter Gene. Die Datenbasis dieser Arbeit beruht auf einer genomweit durchgeführten Mikroarray-Studie [17] aus dem Jahr 2003; bisher publizierte Studien wurden anhand der Daten öffentlicher EST- bzw. SAGE-Datenbanken durchgeführt. Diese Arbeit stellt einen erstmaligen Ansatz bezüglich der Analyse von Lokalisation und Expression anhand von kardiovaskulären Mikroarray-Expressionsdaten dar. In verschiedenen Analysen wurde die Regulation von Genen durch spezifische Transkriptionsfaktoren und die funktionelle Verwandtschaft geclusterter Gene untersucht. Cluster von co-exprimierten Genen sind in Prokaryonten bekannt und wurden ebenfalls in Eukaryonten, wie beispielsweise in Caenorhabditis elegans und Drosophila melanogaster beobachtet. Auch im humanen Genom wurden Cluster von co-exprimierten Genen beobachtet. Während einige Studien davon ausgehen, dass die beobachteten Cluster aus Housekeeping- Genen bestehen, postulieren andere ein gewebesspezifisches Auftreten geclusterter Gene. Die Messung der Expressionen erfolgte in einem genomweiten Ansatz, bei dem Herzgewebe-Proben von 55 Patienten mit verschiedenen Phänotypen untersucht wurden. Die Expressionen wurden anhand von cDNA-Mikroarray-Technologie (Human Unigene Set II-Arrays) ermittelt. Die Bestimmung der chromosomalen Lokalisation der Gene erfolgte durch Sequenz-Vergleich auf Basis des Ensembl-Datensatzes. Insgesamt wurde die Expression von 16.488 verschiedenen Genen analysiert und ein Datensatz von 3.148 generell im Herzen exprimierten Genen erstellt. Die Identifizierung potentiell regulierender Transkriptionsfaktoren und biologischer Funktionen erfolgte anhand der CORG-, TRANSFAC- und Gene Ontology-Datensätze. Es wurde beobachtet, dass kardial exprimierte Gene häufig in 2er- und 3er-Clustern lokalisiert sind. Das Expressionsniveau der in Clustern lokalisierten Gene entsprach dem Expressionsverhalten nicht geclusterter, kardial exprimierter Gene. Durch die Analyse der Regulation geclusterter Gene konnte gezeigt werden, dass diese signifikant häufiger Bindungsstellen für gemeinsame Transkriptionsfaktoren besitzen als nicht geclusterte Gene. Es konnte nicht beobachtet werden, dass benachbarte Gene generell an anderen biologischen Prozessen beteiligt sind als nicht benachbarte Gene. Eine signifikant häufigere ähnliche Funktion geclusterter Gene konnte im Datensatz nicht beobachtet werden. Aufgrund der derzeit geringen Anzahl von GO-annotierten Genen kann ein Zusammenhang aber nicht völlig ausgeschlossen werden

    Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome

    No full text
    Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq

    Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome

    No full text
    Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in 11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.This work was funded by National Human Genome Research Institute (NHGRI)/National Institutes of Health (NIH) grants to GENCODE (U54 HG004555) and Cold Spring Harbor Laboratories (U54 HG004557) subgroups of the ENCODE project. We acknowledge grants from the Swiss National Science Foundation to A.R., from the Spanish Ministry of Science (RD07/0067/0012, BIO2006-03380, and CSD2007-00050) to R.G., and from the Wellcome Trust (WT077198/Z/05/Z) to J.H., A.F., J.M.G., F.K., and T.J.H

    The Ensembl gene annotation system

    No full text
    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail
    corecore