Search CORE

13 research outputs found

RNAcentral 2021: secondary structure integration, improved sequence search and new member databases

Author: Barshir R
Bateman A
Bouchard-Bourelle P
Bruford E
Cannone JJ
Chan PP
dos Santos G
Finn RD
Fishilevich S
Frankish A
Fromm B
Gorodkin J
Griffiths-Jones S
Gutell RR
Hatzigeorgiou AG
Hoksza D
Kalvari I
Karagkouni D
Karlowski WM
Kay S
Kramarz B
Lovering RC
Lowe TM
Lui LM
Ma L
Mani P
Marygold S
Mestdagh P
Mudge JM
Nawrocki EP
Panni S
Peterson KJ
Petrov A
Petrov AS
Porras P
Ramachandran S
Ribas CE
Scott M
Seal R
Seemann SE
Sweeney BA
Szymanski M
Volders P-J
Weinberg Z
Weng S
Zhang Z
Publication venue: OXFORD UNIV PRESS
Publication date: 08/01/2021
Field of study

RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world’s largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org

UCL Discovery

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics

Author: A Dress
A Subramanian
AG Hatzigeorgiou
AJ Enright
AR Mushegian
AR Subramanian
B Misof
B Morgenstern
Burkhard Morgenstern
C Dessimoz
C Lottaz
C Notredame
C Zmasek
CB Do
CW Dunn
Dirk Erpenbeck
E Birney
E Sonnhammer
EV Koonin
F Chen
F Delsuc
F Delsuc
F Schreiber
Fabian Schreiber
Gert Wörheide
H Gee
H Philippe
J Castresana
J Ruan
J Wasmuth
J Wiens
JA Eisen
JE Stajich
JGB Changhui Yan
K Dolinski
K Katoh
K Katoh
Kerstin Pick
KP O'Brien
L Duret
L Li
M Schmollinger
O Poirot
R Chenna
R Durbin
R Edgar
R Tatusov
RC Edgar
S Altschul
SJ Bourlat
SR Eddy
T Gentzsch
T Tatusova
WM Fitch
Y Fukunishi
Y Zhou
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically. Results: We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences. Conclusion: OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

TESTLoc: protein subcellular localization prediction from EST data

Author: A Chacinska
A Kumar
A Pierleoni
A Reinhardt
AG Hatzigeorgiou
BF Lang
C Guda
C Guda
C Iseli
CS Yu
CS Yu
D Sarda
Gertraud Burger
H Bannai
H Shatkay
HM Yuan
HN Lin
HW Platta
I Small
J Assfalg
J Li
J Liu
J Parkinson
JD Wasmuth
K Baerenfaller
KC Chou
KC Chou
KJ Park
L Barbe
LB Koski
M Boden
MG Claros
MS Boguski
MS Scott
O Emanuelsson
P Rice
R Casadio
R Kaundal
R Lascaris
R Nair
R Nair
R Nair
RE Fan
S Briesemeister
S Hua
SF Altschul
T Blum
TM Devlin
W Li
WK Huh
Y Huang
Y Lee
Yao-Qing Shen
YQ Shen
YQ Shen
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities

The assumption that RNA can be readily classified into either protein-coding or non-protein–coding categories has pervaded biology for close to 50 years. Until recently, discrimination between these two categories was relatively straightforward: most transcripts were clearly identifiable as protein-coding messenger RNAs (mRNAs), and readily distinguished from the small number of well-characterized non-protein–coding RNAs (ncRNAs), such as transfer, ribosomal, and spliceosomal RNAs. Recent genome-wide studies have revealed the existence of thousands of noncoding transcripts, whose function and significance are unclear. The discovery of this hidden transcriptome and the implicit challenge it presents to our understanding of the expression and regulation of genetic information has made the need to distinguish between mRNAs and ncRNAs both more pressing and more complicated. In this Review, we consider the diverse strategies employed to discriminate between protein-coding and noncoding transcripts and the fundamental difficulties that are inherent in what may superficially appear to be a simple problem. Misannotations can also run in both directions: some ncRNAs may actually encode peptides, and some of those currently thought to do so may not. Moreover, recent studies have shown that some RNAs can function both as mRNAs and intrinsically as functional ncRNAs, which may be a relatively widespread phenomenon. We conclude that it is difficult to annotate an RNA unequivocally as protein-coding or noncoding, with overlapping protein-coding and noncoding transcripts further confounding this distinction. In addition, the finding that some transcripts can function both intrinsically at the RNA level and to encode proteins suggests a false dichotomy between mRNAs and ncRNAs. Therefore, the functionality of any transcript at the RNA level should not be discounted

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

RNAcentral: a comprehensive database of non-coding RNA sequences

Author: Basu S
Bateman A
Berardini TZ
Bruford EA
Bujnicki JM
Cannone JJ
Chai B
Chan PP
Chen R
Cherry JM
Clark M
Cochrane G
Cole JR
Dinger ME
Engel SR
Fey P
Finn RD
Frankish A
Gray KA
Griffiths-Jones S
Gutell RR
Hatzigeorgiou AG
Howe KL
Huala E
Kalvari I
Karlowski WM
Kay SJE
Kenmochi N
Kersey PJ
Kozomara A
Lau BY
Lowe TM
Ma L
Machnicka MA
McDonald D
Mestdagh P
Paraskevopoulou MD
Petrov AI
Puetz J
Quek XC
Stadler PF
Szymanski M
Vlachos IS
Volders PJ
Williams KP
Wood V
Wower J
Yoshihama M
Zhang Z
Zhao Y
Zhu W
Zwieb CW
Publication venue: Nucleic Acids Research
Publication date: 28/10/2016
Field of study

RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/.Biotechnology and Biological Sciences Research Council (BBSRC) [BB/J019232/1]

UNSWorks

Apollo (Cambridge)

University of Melbourne Institutional Repository

TARCLOUD: A cloud-based platform to support miRNA target prediction

Author: Alexakis M
Dalamagas T
Hatzigeorgiou AG
Maragkakis M
Sellis T
Vergoulis T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study