Search CORE

691 research outputs found

Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

Author: Dambach Johannes
Kück Patrick
Meusemann Karen
Misof Bernhard
Thormann Birthe
von Reumont Björn M
Wägele Johann W
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of <it>a priori </it>rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems

Author: González-Domínguez Jorge
Liu Yongchao
Schmidt Bertil
Touriño Juan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of recordJorge González-Domínguez, Yongchao Liu, Juan Touriño, Bertil Schmidt; MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems, Bioinformatics, Volume 32, Issue 24, 15 December 2016, Pages 3826–3828, https://doi.org/10.1093/bioinformatics/btw558is available online at: https://doi.org/10.1093/bioinformatics/btw558[Abstracts] MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

XRate: a fast prototyping, training and annotation tool for phylo-grammars

Author: Bendaña Yuri R
Bradley Robert K
Chao Sharon
Goldman Nick
Holmes Ian
Klosterman Peter S
Kosiol Carolin
Uzilov Andrew V
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. RESULTS: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. CONCLUSION: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

Author: A Krogh
A Marchler-Bauer
A Milosavljević
A Pertsemlidis
AA Schäffer
AY Mitrophanov
BJ Webb
Burkhard Rost
C Barrett
C Webber
D Drasdo
D Metzler
D Siegmund
DJC MacKay
EJ Gumbel
EP Nawrocki
ET Jaynes
I Letunic
J Park
JD Storey
JF Lawless
JS Liu
K Karplus
K Karplus
K Sjölander
M Madera
MG Kann
MQ Zhang
MS Waterman
N Chia
P Bucher
R Bundschuh
R Durbin
R Mott
R Mott
R Mott
R Olsen
RC Edgar
RD Finn
S Johnson
S Karlin
S Karlin
S Miyazawa
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SR Eddy
SR Eddy
TF Smith
WR Pearson
Y-K Yu
Y-K Yu
Y-K Yu
Y-K Yu
Publication venue: Public Library of Science
Publication date: 01/05/2008
Field of study

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Modeling the Evolution of Regulatory Elements by Simultaneous Detection and Alignment with Phylogenetic Pair HMMs

Author: A Loytynoja
A Siepel
A Siepel
A Viterbi
AL Halpern
AM Moses
AP Boyle
B Langmead
D Stanojevic
DA Pollard
DL Gumucio
DS Hirschberg
G Wray
GP Wagner
I Holmes
J Felsenstein
J Hawkins
JC Bryne
JD Thompson
JL Thorne
K Wong
MS Halfon
MZ Ludwig
MZ Ludwig
N Saitou
PR Ray
R Durbin
R Satija
R Siddharthan
RC Edgar
RK Bradley
RW Lusk
Uwe Ohler
W Huang
WH Majoros
WH Majoros
William H. Majoros
WJ Kent
WJL Quesne
Wyeth W. Wasserman
X He
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

DukeSpace

MDC Repository