65,367 research outputs found

    Pairwise alignment incorporating dipeptide covariation

    Full text link
    Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation

    Multiple sequence alignment based on set covers

    Full text link
    We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

    Optimally fast incremental Manhattan plane embedding and planar tight span construction

    Full text link
    We describe a data structure, a rectangular complex, that can be used to represent hyperconvex metric spaces that have the same topology (although not necessarily the same distance function) as subsets of the plane. We show how to use this data structure to construct the tight span of a metric space given as an n x n distance matrix, when the tight span is homeomorphic to a subset of the plane, in time O(n^2), and to add a single point to a planar tight span in time O(n). As an application of this construction, we show how to test whether a given finite metric space embeds isometrically into the Manhattan plane in time O(n^2), and add a single point to the space and re-test whether it has such an embedding in time O(n).Comment: 39 pages, 15 figure

    Higher accuracy protein Multiple Sequence Alignment by Stochastic Algorithm

    Get PDF
    Multiple Sequence Alignment gives insight into evolutionary, structural and functional relationships among the proteins. Here, a novel Protein Alignment by Stochastic Algorithm (PASA) is developed. Evolutionary operators of a genetic algorithm, namely, mutation and selection are utilized in combining the output of two most important sequence alignment programs and then developing an optimized new algorithm. Efficiency of protein alignments is evaluated in terms of Total Column score which is equal to the number of correctly aligned columns between a test alignment and the reference alignment divided by the total number of columns in the reference alignment. The PASA optimizer achieves, on an average, significant better alignment over the well known individual bioinformatics tools. This PASA is statistically the most accurate protein alignment method today. It can have potential applications in drug discovery processes in the biotechnology industry

    The Binary Space Partitioning-Tree Process

    Get PDF
    The Mondrian process represents an elegant and powerful approach for space partition modelling. However, as it restricts the partitions to be axis-aligned, its modelling flexibility is limited. In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process. The BSP-Tree process is an almost surely right continuous Markov jump process that allows uniformly distributed oblique cuts in a two-dimensional convex polygon. The BSP-Tree process can also be extended using a non-uniform probability measure to generate direction differentiated cuts. The process is also self-consistent, maintaining distributional invariance under a restricted subdomain. We use Conditional-Sequential Monte Carlo for inference using the tree structure as the high-dimensional variable. The BSP-Tree process's performance on synthetic data partitioning and relational modelling demonstrates clear inferential improvements over the standard Mondrian process and other related methods

    Quantum Hamiltonian reduction of W-algebras and category O

    Full text link
    W-algebras are a class of non-commutative algebras related to the classical universal enveloping algebras. They can be defined as a subquotient of U(g) related to a choice of nilpotent element e and compatible nilpotent subalgebra m. The definition is a quantum analogue of the classical construction of Hamiltonian reduction. We define a quantum version of Hamiltonian reduction by stages and use it to construct intermediate reductions between different W-algebras U(g,e) in type A.This allows us to express the W-algebra U(g,e') as a subquotient of U(g,e) for nilpotent elements e' covering e. It also produces a collection of (U(g,e),U(g,e'))-bimodules analogous to the generalised Gel'fand-Graev modules used in the classical definition of the W-algebra; these can be used to obtain adjoint functors between the corresponding module categories. The category of modules over a W-algebra has a full subcategory defined in a parallel fashion to that of the Bernstein-Gel'fand-Gel'fand (BGG) category O; this version of category O(e) for W-algebras is equivalent to an infinitesimal block of O by an argument of Mili\v{c}i\'{c} and Soergel. We therefore construct analogues of the translation functors between the different blocks of O, in this case being functors between the categories O(e) for different W-algebras U(g,e). This follows an argument of Losev, and realises the category O(e') as equivalent to a full subcategory of the category O(e) where e' is greater than e in the refinement ordering.Comment: University of Toronto PhD thesis, defended July 2014, 57 page

    A Two-Phase Dynamic Programming Algorithm Tool for DNA Sequences

    Get PDF
    Sequence alignment has to do with the arrangement of DNA, RNA, and protein sequences to identify areas of similarity. Technic ally, it involves the arrangement of the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Similarity may be a consequence of functional, s tructural, or evolutionary relationships between the sequences. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as mutations, and gaps as insertions. Such information becomes of great use in vital areas such as the study of d iseases, genomics and generally in the biological sciences. Thus, sequence alignment presents not just an exciting field of study, but a field of great importance to mankind. In this light, we extensively studied about seventy (70) existing sequence alignment tools available to us. Most of these tools are not user friendly and cannot be used by biologists. The few tools that attempted both Local and Global algorithms are not ready available freely. We therefore implemented a sequence alignment tool (CU-Aligner) in an understandable, user-friendly and portable way, with click-of-a-button simplicity. This is done utilizing the Needleman-Wunsh and Smith-Waterman algorithms for global and local alignments, respectively which focuses primarily on DNA sequences. Our aligner is implemented in the Java language in both application and applet mode and has been efficient on all windows operating systems
    • …
    corecore