Search CORE

fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences

Author: Asai Kiyoshi
Kimura Yuki
Kin Taishin
Kojima Aya
Komori Takashi
Okida Hiroaki
Ono Yukiteru
Terai Goro
Yamada Kouichirou
Yoshinari Yasuhiko
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available a

CiteSeerX

CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine

Author: Altschul
Bateman
Borgwardt
Brown
Carninci
Carninci
Cochrane
Eddy
Frith
Furey
Furuno
Ge Gao
Griffiths-Jones
Hatzigeorgiou
Lei
Lei Kong
Letunic
Liping Wei
Liu
Lottaz
Madera
Maeda
Mattick
Mattick
Mignone
Nagaraj
Okazaki
Pang
Petrova
Shafer
Shu-Qi Zhao
Slater
Witten
Wu
Xiao-Qiao Liu
Yong Zhang
Yu
Zhi-Qiang Ye
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Recent transcriptome studies have revealed that a large number of transcripts in mammals and other organisms do not encode proteins but function as noncoding RNAs (ncRNAs) instead. As millions of transcripts are generated by large-scale cDNA and EST sequencing projects every year, there is a need for automatic methods to distinguish protein-coding RNAs from noncoding RNAs accurately and quickly. We developed a support vector machine-based classifier, named Coding Potential Calculator (CPC), to assess the protein-coding potential of a transcript based on six biologically meaningful sequence features. Tenfold cross-validation on the training dataset and further testing on several large datasets showed that CPC can discriminate coding from noncoding transcripts with high accuracy. Furthermore, CPC also runs an order-of-magnitude faster than a previous state-of-the-art tool and has higher accuracy. We developed a user-friendly web-based interface of CPC at http://cpc.cbi.pku.edu.cn. In addition to predicting the coding potential of the input transcripts, the CPC web server also graphically displays detailed sequence features and additional annotations of the transcript that may facilitate users’ further investigation

RNAcentral: A vision for an international database of RNA sequences

Author: Agrawal Shipra
Bateman Alex
Birney Ewan
Bruford Elspeth A
Bujnicki Janusz M
Cochrane Guy
Cole James R
Dinger Marcel E
Enright Anton J
Gardner Paul P
Gautheret Daniel
Griffiths-Jones Sam
Harrow Jen
Herrero Javier
Holmes Ian H
Huang Hsien-Da
Kelly Krystyna A
Kersey Paul
Kozomara Ana
Lowe Todd M
Marz Manja
Moxon Simon
Pruitt Kim D
Samuelsson Tore
Stadler Peter F
Vilella Albert J
Vogel Jan-Hinnerk
Williams Kelly P
Wright Mathew W
Zwieb Christian
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 23/09/2011
Field of study

During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor

UCL Discovery

The University of Manchester - Institutional Repository

University of East Anglia digital repository

Massively Parallel Sequencing of Human Urinary Exosome/Microvesicle RNA Reveals a Predominance of Non-Coding RNA

Author: Adiconis Xian
Bond Daniel T.
Brown Dennis
Levin Joshua Z.
Miranda Kevin C.
Nusbaum Chad
Russ Carsten
Russo Leileata M.
Sivachenko Andrey
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Intact RNA from exosomes/microvesicles (collectively referred to as microvesicles) has sparked much interest as potential biomarkers for the non-invasive analysis of disease. Here we use the Illumina Genome Analyzer to determine the comprehensive array of nucleic acid reads present in urinary microvesicles. Extraneous nucleic acids were digested using RNase and DNase treatment and the microvesicle inner nucleic acid cargo was analyzed with and without DNase digestion to examine both DNA and RNA sequences contained in microvesicles. Results revealed that a substantial proportion (∼87%) of reads aligned to ribosomal RNA. Of the non-ribosomal RNA sequences, ∼60% aligned to non-coding RNA and repeat sequences including LINE, SINE, satellite repeats, and RNA repeats (tRNA, snRNA, scRNA and srpRNA). The remaining ∼40% of non-ribosomal RNA reads aligned to protein coding genes and splice sites encompassing approximately 13,500 of the known 21,892 protein coding genes of the human genome. Analysis of protein coding genes specific to the renal and genitourinary tract revealed that complete segments of the renal nephron and collecting duct as well as genes indicative of the bladder and prostate could be identified. This study reveals that the entire genitourinary system may be mapped using microvesicle transcript analysis and that the majority of non-ribosomal RNA sequences contained in microvesicles is potentially functional non-coding RNA, which play an emerging role in cell regulation

CiteSeerX

Harvard University - DASH

Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

Author: Aturaliya Rajith N
Batalov Serge
Beisel Kirk W
Bult Carol J
Carninci Piero
Engström Pär G
Fletcher Colin F
Forrest Alistair R. R
Frith Martin
Furuno Masaaki
Gough Julian
Hayashizaki Yoshihide
Hill David
Hume David A
Itoh Masayoshi
Kai Chikatoshi
Kanamori-Katayama Mutsumi
Kasukawa Takeya
Katayama Shintaro
Katoh Masaru
Kawai Jun
Kawashima Tsugumi
Lenhard Boris
Maeda Norihiro
Oyama Rieko
Quackenbush John
Ravasi Timothy
Ring Brian Z
Shibata Kazuhiro
Sugiura Koji
Takenaka Yoichi
Teasdale Rohan D
Wells Christine A
Zhu Yunxia
Publication venue: Public Library of Science
Publication date: 01/01/2006
Field of study

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species

Edinburgh Research Explorer

The Novartis Repository

University of Melbourne Institutional Repository

Explore Bristol Research

University of Queensland eSpace

Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human

Author: Gerstein Mark
Pang Andy Wing Chun
Zhang Zhaolei
Publication venue: BioMed Central
Publication date: 01/02/2007
Field of study

BACKGROUND: Widespread transcription activities in the human genome were recently observed in high-resolution tiling array experiments, which revealed many novel transcripts that are outside of the boundaries of known protein or RNA genes. Termed as "TARs" (Transcriptionally Active Regions), these novel transcribed regions represent "dark matter" in the genome, and their origin and functionality need to be explained. Many of these transcripts are thought to code for novel proteins or non-protein-coding RNAs. We have applied an integrated bioinformatics approach to investigate the properties of these TARs, including cross-species conservation, and the ability to form stable secondary structures. The goal of this study is to identify a list of potential candidate sequences that are likely to code for functional non-protein-coding RNAs. We are particularly interested in the discovery of those functional RNA candidates that are primate-specific, i.e. those that do not have homologs in the mouse or dog genomes but in rhesus. RESULTS: Using sequence conservation and the probability of forming stable secondary structures, we have identified ~300 possible candidates for primate-specific noncoding RNAs. We are currently in the process of sequencing the orthologous regions of these candidate sequences in several other primate species. We will then be able to apply a "phylogenetic shadowing" approach to analyze the functionality of these ncRNA candidates. CONCLUSION: The existence of potential primate-specific functional transcripts has demonstrated the limitation of previous genome comparison studies, which put too much emphasis on conservation between human and rodents. It also argues for the necessity of sequencing additional primate species to gain a better and more comprehensive understanding of the human genome

University of Toronto Research Repository

Clusters of internally primed transcripts reveal novel long noncoding RNAs

Author: Bill Pavan
Carol Bult
Chikatoshi Kai
Harukazu Suzuki
John Hancock
John S Mattick
Judith Blake
Jun Kawai
Ken C Pang
Lisa Stubbs
Martin C Frith
Masaaki Furuno
Noriko Ninomiya
Piero Carninci
Shiro Fukuda
Yoshihide Hayashizaki
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2006
Field of study

Non- protein- coding RNAs ( ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25- nucleotide microRNAs, several functional ncRNAs are orders of magnitude larger in size. Examples of such macro ncRNAs include Xist and Air, which in mouse are 18 and 108 kilobases ( Kb), respectively. We surveyed the 102,801 FANTOM3 mouse cDNA clones and found that Air and Xist were present not as single, full- length transcripts but as a cluster of multiple, shorter cDNAs, which were unspliced, had little coding potential, and were most likely primed from internal adenine- rich regions within longer parental transcripts. We therefore conducted a genome- wide search for regional clusters of such cDNAs to find novel macro ncRNA candidates. Sixty- six regions were identified, each of which mapped outside known protein- coding loci and which had a mean length of 92 Kb. We detected several known long ncRNAs within these regions, supporting the basic rationale of our approach. In silico analysis showed that many regions had evidence of imprinting and/ or antisense transcription. These regions were significantly associated with microRNAs and transcripts from the central nervous system. We selected eight novel regions for experimental validation by northern blot and RT- PCR and found that the majority represent previously unrecognized noncoding transcripts that are at least 10 Kb in size and predominantly localized in the nucleus. Taken together, the data not only identify multiple new ncRNAs but also suggest the existence of many more macro ncRNAs like Xist and Air

The Jackson Laboratory: The Mouseion at the JAXlibrary