7 research outputs found

    Summation test for gap penalties and strong law of the local alignment score

    Full text link
    A summation test is proposed to determine admissible types of gap penalties for logarithmic growth of the local alignment score. We also define a converging sequence of log moment generating functions that provide the constants associated with the large deviation rate and logarithmic strong law of the local alignment score and the asymptotic number of matches in the optimal local alignment.Comment: Published at http://dx.doi.org/10.1214/105051605000000061 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Comparative Analysis of Cyclic Sequences: Viroids and other Small Circular RNA`s

    Get PDF
    The analysis of small circular sequences requires specialized tools. While the differences between linear and circular sequences can be neglected in the case of long molecules such as bacterial genomes since in practice all analysis is performed in sequence windows, this is not true for viroids and related sequences which are usually only a few hundred basepairs long. In this contribution we present basic algorithms and corresponding software for circular RNAs. In particular, we discuss the problem of pairwise and multiple cyclic sequence alignments with affine gap costs, and an extension of a recent approach to circular RNA folding to the computation of consensus structures

    Alignments of mitochondrial genome arrangements: Applications to metazoan phylogeny

    Get PDF
    Mitochondrial genomes provide a valuable dataset for phylogenetic studies, in particular of metazoan phylogeny because of the extensive taxon sample that is available. Beyond the traditional sequence-based analysis it is possible to extract phylogenetic information from the gene order. Here we present a novel approach utilizing these data based on cyclic list alignments of the gene orders. A progressive alignment approach is used to combine pairwise list alignments into a multiple alignment of gene orders. Parsimony methods are used to reconstruct phylogenetic trees, ancestral gene orders, and consensus patterns in a straightforward approach. We apply this method to study the phylogeny of protostomes based exclusively on mitochondrial genome arrangements. We, furthermore, demonstrate that our approach is also applicable to the much larger genomes of chloroplasts

    Progressive Multiple Sequence Alignments from Triplets

    Get PDF
    Motivation: The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Idea: Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the “once a gap, always a gap” problem of progressive alignment procedures. Results: The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores

    A Sequence Alignment Algorithm with an Arbitrary Gap Penalty Function

    No full text

    Unveiling the Molecular Mechanisms Regulating the Activation of the ErbB Family Receptors at Atomic Resolution through Molecular Modeling and Simulations

    Get PDF
    The EGFR/ErbB/HER family of kinases contains four homologous receptor tyrosine kinases that are important regulatory elements in key signaling pathways. To elucidate the atomistic mechanisms of dimerization-dependent activation in the ErbB family, we have performed molecular dynamics simulations of the intracellular kinase domains of the four members of the ErbB family (those with known kinase activity), namely EGFR, ErbB2 (HER2) and ErbB4 (HER4) as well as ErbB3 (HER3), an assumed pseudokinase, in different molecular contexts: monomer vs. dimer, wildtype vs. mutant. Using bioinformatics and fluctuation analyses of the molecular dynamics trajectories, we relate sequence similarities to correspondence of specific bond-interaction networks and collective dynamical modes. We find that in the active conformation of the ErbB kinases (except ErbB3), key subdomain motions are coordinated through conserved hydrophilic interactions: activating bond-networks consisting of hydrogen bonds and salt bridges. The inactive conformations also demonstrate conserved bonding patterns (albeit less extensive) that sequester key residues and disrupt the activating bond network. Both conformational states have distinct hydrophobic advantages through context-specific hydrophobic interactions. The inactive ErbB3 kinase domain also shows coordinated motions similar to the active conformations, in line with recent evidence that ErbB3 is a weakly active kinase, though the coordination seems to arise from hydrophobic interactions rather than hydrophilic ones. We show that the functional (activating) asymmetric kinase dimer interface forces a corresponding change in the hydrophobic and hydrophilic interactions that characterize the inactivating interaction network, resulting in motion of the αC-helix through allostery. Several of the clinically identified activating kinase mutations of EGFR act in a similar fashion to disrupt the inactivating interaction network. Our molecular dynamics study reveals the asymmetric dimer interface helps progress the ErbB family through the activation pathway using both hydrophilic and hydrophobic interaction. There is a fundamental difference in the sequence of events in EGFR activation compared with that described for the Src kinase Hck

    Studying Evolutionary Change: Transdisciplinary Advances in Understanding and Measuring Evolution

    Get PDF
    Evolutionary processes can be found in almost any historical, i.e. evolving, system that erroneously copies from the past. Well studied examples do not only originate in evolutionary biology but also in historical linguistics. Yet an approach that would bind together studies of such evolving systems is still elusive. This thesis is an attempt to narrowing down this gap to some extend. An evolving system can be described using characters that identify their changing features. While the problem of a proper choice of characters is beyond the scope of this thesis and remains in the hands of experts we concern ourselves with some theoretical as well data driven approaches. Having a well chosen set of characters describing a system of different entities such as homologous genes, i.e. genes of same origin in different species, we can build a phylogenetic tree. Consider the special case of gene clusters containing paralogous genes, i.e. genes of same origin within a species usually located closely, such as the well known HOX cluster. These are formed by step- wise duplication of its members, often involving unequal crossing over forming hybrid genes. Gene conversion and possibly other mechanisms of concerted evolution further obfuscate phylogenetic relationships. Hence, it is very difficult or even impossible to disentangle the detailed history of gene duplications in gene clusters. Expanding gene clusters that use unequal crossing over as proposed by Walter Gehring leads to distinctive patterns of genetic distances. We show that this special class of distances helps in extracting phylogenetic information from the data still. Disregarding genome rearrangements, we find that the shortest Hamiltonian path then coincides with the ordering of paralogous genes in a cluster. This observation can be used to detect ancient genomic rearrangements of gene clus- ters and to distinguish gene clusters whose evolution was dominated by unequal crossing over within genes from those that expanded through other mechanisms. While the evolution of DNA or protein sequences is well studied and can be formally described, we find that this does not hold for other systems such as language evolution. This is due to a lack of detectable mechanisms that drive the evolutionary processes in other fields. Hence, it is hard to quantify distances between entities, e.g. languages, and therefore the characters describing them. Starting out with distortions of distances, we first see that poor choices of the distance measure can lead to incorrect phylogenies. Given that phylogenetic inference requires additive metrics we can infer the correct phylogeny from a distance matrix D if there is a monotonic, subadditive function ζ such that ζ^−1(D) is additive. We compute the metric-preserving transformation ζ as the solution of an optimization problem. This result shows that the problem of phylogeny reconstruction is well defined even if a detailed mechanistic model of the evolutionary process is missing. Yet, this does not hinder studies of language evolution using automated tools. As the amount of available and large digital corpora increased so did the possibilities to study them automatically. The obvious parallels between historical linguistics and phylogenetics lead to many studies adapting bioinformatics tools to fit linguistics means. Here, we use jAlign to calculate bigram alignments, i.e. an alignment algorithm that operates with regard to adjacency of letters. Its performance is tested in different cognate recognition tasks. Using pairwise alignments one major obstacle is the systematic errors they make such as underestimation of gaps and their misplacement. Applying multiple sequence alignments instead of a pairwise algorithm implicitly includes more evolutionary information and thus can overcome the problem of correct gap placement. They can be seen as a generalization of the string-to-string edit problem to more than two strings. With the steady increase in computational power, exact, dynamic programming solutions have become feasible in practice also for 3- and 4-way alignments. For the pairwise (2-way) case, there is a clear distinction between local and global alignments. As more sequences are consid- ered, this distinction, which can in fact be made independently for both ends of each sequence, gives rise to a rich set of partially local alignment problems. So far these have remained largely unexplored. Thus, a general formal frame- work that gives raise to a classification of partially local alignment problems is introduced. It leads to a generic scheme that guides the principled design of exact dynamic programming solutions for particular partially local alignment problems
    corecore