Search CORE

4,128 research outputs found

Accelerated probabilistic inference of RNA structure evolution

Author: Holmes Ian
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. RESULTS: We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. CONCLUSION: A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Potential conservation of circadian clock proteins in the phylum Nematoda as revealed by bioinformatic searches

Author: A Claridge-Chang
A Golden
A Romanowski
A Sancar
A Sidow
A Ward
AL Gotter
AL Gotter
AM Aguinaldo
AM van der Linden
Andrés Romanowski
B LeBoeuf
BD Aronson
C Benna
C Trent
CH Ko
CL Baker
CR Gissendanner
D Banerjee
D Weinshenker
Diego Andrés Golombek
DS Fay
E Engelen
E Meelkop
E Munoz
E Petrillo
E Quevillon
EM Schwarz
ET Kipreos
F Sandrelli
G Dong
GC Monsalve
GJ Hendriks
GM Leclerc
H Hao
H Jia
H Jiang
H Kageyama
H Qin
H Qin
HF Gu
HG McWatters
HR Ueda
I Ebersberger
J Hatzold
J Liu
J Yan
JA Powell-Coffman
JD Plautz
JM Tennessen
JN Andersen
JS O'Neill
JS O'Neill
JW Barnes
K Hasegawa
K Tamura
K Tomioka
K Unsal-Kacmaz
K Unsal-Kacmaz
L Dreier
L Temmerman
LS Johnson
M Ishiura
M Jeon
M Kostrouchova
M Miskei
M Olmedo
M Ukai-Tadenuma
María Eugenia Goya
Matías Javier Garavaglia
MF Ceriani
ML Migliori
ML Migliori
ML Migliori
N Mehta
N Ooe
P Erdelyi
Pablo Daniel Ghiringhelli
PD Wes
PE Hardin
PE Hardin
PT Cohen
Q Yuan
RC Chan
RD Finn
RJ Kelly
RJ McFarlane
RS Edgar
S Arur
SA Brown
SE Sanchez
SG Rhee
SH Simonetta
SH Simonetta
SH Simonetta
SJ Romney
SK Hanks
SL Edwards
SR Eddy
T Fiedler
T Janssen
T Takumi
TK Darlington
Urs Albrecht
W Sudhaus
X Wang
X Yang
Y Kumaki
Y Shemesh
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Although several circadian rhythms have been described in C. elegans, its molecular clock remains elusive. In this work we employed a novel bioinformatic approach, applying probabilistic methodologies, to search for circadian clock proteins of several of the best studied circadian model organisms of different taxa (Mus musculus, Drosophila melanogaster, Neurospora crassa, Arabidopsis thaliana and Synechoccocus elongatus) in the proteomes of C. elegans and other members of the phylum Nematoda. With this approach we found that the Nematoda contain proteins most related to the core and accessory proteins of the insect and mammalian clocks, which provide new insights into the nematode clock and the evolution of the circadian system.Fil: Romanowski, Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Cronobiología; ArgentinaFil: Garavaglia, Matías Javier. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Ing.genética y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Goya, María Eugenia. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Cronobiología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Ghiringhelli, Pablo Daniel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Ing.genética y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Golombek, Diego Andres. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Cronobiología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Directory of Open Access Journals

PubMed Central

FigShare

Systematic identification of gene families for use as markers for phylogenetic and phylogeny- driven ecological studies of bacteria and archaea and their major subgroups

Author: Eisen Jonathan A.
Jospin Guillaume
Wu Dongying
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

With the astonishing rate that the genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as marker genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities. We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa. We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for all bacteria and archaea, 114 for all bacteria, and much more for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.Comment: 24 pages, 3 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare