4,128 research outputs found
Accelerated probabilistic inference of RNA structure evolution
BACKGROUND: Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. RESULTS: We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. CONCLUSION: A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License
Developing and applying heterogeneous phylogenetic models with XRate
Modeling sequence evolution on phylogenetic trees is a useful technique in
computational biology. Especially powerful are models which take account of the
heterogeneous nature of sequence evolution according to the "grammar" of the
encoded gene features. However, beyond a modest level of model complexity,
manual coding of models becomes prohibitively labor-intensive. We demonstrate,
via a set of case studies, the new built-in model-prototyping capabilities of
XRate (macros and Scheme extensions). These features allow rapid implementation
of phylogenetic models which would have previously been far more
labor-intensive. XRate's new capabilities for lineage-specific models,
ancestral sequence reconstruction, and improved annotation output are also
discussed. XRate's flexible model-specification capabilities and computational
efficiency make it well-suited to developing and prototyping phylogenetic
grammar models. XRate is available as part of the DART software package:
http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
Potential conservation of circadian clock proteins in the phylum Nematoda as revealed by bioinformatic searches
Although several circadian rhythms have been described in C. elegans, its molecular clock remains elusive. In this work we employed a novel bioinformatic approach, applying probabilistic methodologies, to search for circadian clock proteins of several of the best studied circadian model organisms of different taxa (Mus musculus, Drosophila melanogaster, Neurospora crassa, Arabidopsis thaliana and Synechoccocus elongatus) in the proteomes of C. elegans and other members of the phylum Nematoda. With this approach we found that the Nematoda contain proteins most related to the core and accessory proteins of the insect and mammalian clocks, which provide new insights into the nematode clock and the evolution of the circadian system.Fil: Romanowski, AndrĂ©s. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Parque Centenario. Instituto de Investigaciones BioquĂmicas de Buenos Aires. FundaciĂłn Instituto Leloir. Instituto de Investigaciones BioquĂmicas de Buenos Aires; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de CronobiologĂa; ArgentinaFil: Garavaglia, MatĂas Javier. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de Ing.genĂ©tica y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Goya, MarĂa Eugenia. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de CronobiologĂa; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Ghiringhelli, Pablo Daniel. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de Ing.genĂ©tica y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Golombek, Diego Andres. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de CronobiologĂa; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; Argentin
Systematic identification of gene families for use as markers for phylogenetic and phylogeny- driven ecological studies of bacteria and archaea and their major subgroups
With the astonishing rate that the genomic and metagenomic sequence data sets
are accumulating, there are many reasons to constrain the data analyses. One
approach to such constrained analyses is to focus on select subsets of gene
families that are particularly well suited for the tasks at hand. Such gene
families have generally been referred to as marker genes. We are particularly
interested in identifying and using such marker genes for phylogenetic and
phylogeny-driven ecological studies of microbes and their communities. We
therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology)
markers. The dual use of these PhyEco markers means that we needed to develop
and apply a set of somewhat novel criteria for identification of the best
candidates for such markers. The criteria we focused on included universality
across the taxa of interest, ability to be used to produce robust phylogenetic
trees that reflect as much as possible the evolution of the species from which
the genes come, and low variation in copy number across taxa. We describe here
an automated protocol for identifying potential PhyEco markers from a set of
complete genome sequences. The protocol combines rapid searching, clustering
and phylogenetic tree building algorithms to generate protein families that
meet the criteria listed above. We report here the identification of PhyEco
markers for different taxonomic levels including 40 for all bacteria and
archaea, 114 for all bacteria, and much more for some of the individual phyla
of bacteria. This new list of PhyEco markers should allow much more detailed
automated phylogenetic and phylogenetic ecology analyses of these groups than
possible previously.Comment: 24 pages, 3 figure
- …