Search CORE

47,551 research outputs found

Computational Molecular Coevolution

Author: Dickson Russell J
Publication venue: Scholarship@Western
Publication date: 13/12/2013
Field of study

A major goal in computational biochemistry is to obtain three-dimensional structure information from protein sequence. Coevolution represents a biological mechanism through which structural information can be obtained from a family of protein sequences. Evolutionary relationships within a family of protein sequences are revealed through sequence alignment. Statistical analyses of these sequence alignments reveals positions in the protein family that covary, and thus appear to be dependent on one another throughout the evolution of the protein family. These covarying positions are inferred to be coevolving via one of two biological mechanisms, both of which imply that coevolution is facilitated by inter-residue contact. Thus, high-quality multiple sequence alignments and robust coevolution-inferring statistics can produce structural information from sequence alone. This work characterizes the relationship between coevolution statistics and sequence alignments and highlights the implicit assumptions and caveats associated with coevolutionary inference. An investigation of sequence alignment quality and coevolutionary-inference methods revealed that such methods are very sensitive to the systematic misalignments discovered in public databases. However, repairing the misalignments in such alignments restores the predictive power of coevolution statistics. To overcome the sensitivity to misalignments, two novel coevolution-inferring statistics were developed that show increased contact prediction accuracy, especially in alignments that contain misalignments. These new statistics were developed into a suite of coevolution tools, the MIpToolset. Because systematic misalignments produce a distinctive pattern when analyzed by coevolution-inferring statistics, a new method for detecting systematic misalignments was created to exploit this phenomenon. This new method called ``local covariation\u27\u27 was used to analyze publicly-available multiple sequence alignment databases. Local covariation detected putative misalignments in a database designed to benchmark sequence alignment software accuracy. Local covariation was incorporated into a new software tool, LoCo, which displays regions of potential misalignment during alignment editing assists in their correction. This work represents advances in multiple sequence alignment creation and coevolutionary inference

Scholarship@Western

The art of sequence alignment

Author: Regl Alois
Publication venue
Publication date: 12/02/2014
Field of study

Sequence similarity: why are we interested in, how do you define it Scoring metrics: what is important in similarity Scoring with DNA: integrating biological knowledge Scoring with proteins: integrating biological knowledge PAMs and BLOSUMs: the marriage of statistics and biology Gaps: how strong do they count? Constant, affine and concave gap penalties The dynamic programmig trick Global versus local Heuristics to get some speed Can we trust our alignment? PSI-BLAST & Co: highlights and traps Multiple Sequence Alignment: finding and scoring them The third dimension From alignments to trees Beyond alignmentsUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

Repositorio Institucional Universidad de Málaga

Island method for estimating the statistical significance of profile-profile alignment scores

Author: A Dembo
A Gambin
A Poleksic
A Poleksic
AG Murzin
Aleksandar Poleksic
D Fischer
D Przybylski
DA Debe
E Lindahl
EJ Gumbel
G Yona
H Pang
J Heringa
J Moult
J Söding
JF Collins
JF Lawless
K Ginalski
L Holm
L Rychlewski
L Rychlewski
M Frenkel-Morgenstern
MS Waterman
MS Waterman
O Bastien
O Bastien
R Mott
R Mott
R Olsen
RI Sadreyev
RI Sadreyev
S Karlin
S Karlin
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SR Eddy
T Hulsen
TF Smith
TF Smith
WR Pearson
WR Pearson
YK Yu
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many experiments suggest that the distribution of local profile-profile alignment scores is of the Gumbel form. However, estimating distribution parameters by random simulations turns out to be computationally very expensive. Results We demonstrate that the background distribution of profile-profile alignment scores heavily depends on profiles' composition and thus the distribution parameters must be estimated independently, for each pair of profiles of interest. We also show that accurate estimates of statistical parameters can be obtained using the "island statistics" for profile-profile alignments. Conclusion The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization. Since multiple island scores can be extracted from a single comparison of two profiles, the island method has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Northern Iowa

Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version

Author: Sahinalp S. Cenk
Salari Raheleh
Schönhuth Alexander
Publication venue
Publication date: 11/06/2010
Field of study

Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.Comment: 17 pages, 7 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

Polarization alignments of radio quasars in JVAS/CLASS surveys

Author: Hutsemékers Damien
Pelgrims Vincent
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/03/2015
Field of study

We test the hypothesis that the polarization vectors of flat-spectrum radio sources (FSRS) in the JVAS/CLASS 8.4-GHz surveys are randomly oriented on the sky. The sample with robust polarization measurements is made of

4155

objects and redshift information is known for

1531

of them. We performed two statistical analyses: one in two dimensions and the other in three dimensions when distance is available. We find significant large-scale alignments of polarization vectors for samples containing only quasars (QSO) among the varieties of FSRS's. While these correlations prove difficult to explain either by a physical effect or by biases in the dataset, the fact that the QSO's which have significantly aligned polarization vectors are found in regions of the sky where optical polarization alignments were previously found is striking.Comment: 13 pages, 9 figures, submitted to MNRA

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

CLEVER: Clique-Enumerating Variant Finder

Author: Bauer Markus
Canzar Stefan
Costa Ivan
Klau Gunnar
Marschall Tobias
Schliep Alexander
Schönhuth Alexander
Publication venue
Publication date: 01/01/2012
Field of study

Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present various relevant performance statistics. We achieve superior performance rates in particular on indels of sizes 20--100, which have been exposed as a current major challenge in the SV discovery literature and where prior insert size based approaches have limitations. In that size range, we outperform even split read aligners. We achieve good results also on real data where we make a substantial amount of correct predictions as the only tool, which complement the predictions of split-read aligners. CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.Comment: 30 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

VU Research Portal

CWI's Institutional Repository

Publikationsserver der RWTH Aachen University

Publications at Bielefeld University

Evolutionary Inference via the Poisson Indel Process

Author: Alexandre Bouchard-Côté
Buiculescu
Cox
Dreyer
Hein
Hein
Huelsenbeck
Michael I. Jordan
Miklós
Nelesen
Roshan
Saitou
Searls
Wheeler
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 18/01/2013
Field of study

We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classical evolutionary process, the TKF91 model, is a continuous-time Markov chain model comprised of insertion, deletion and substitution events. Unfortunately this model gives rise to an intractable computational problem---the computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a new stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The new model is closely related to the TKF91 model, differing only in its treatment of insertions, but the new model has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared to separate inference of phylogenies and alignments.Comment: 33 pages, 6 figure

arXiv.org e-Print Archive

Crossref