5 research outputs found

    Simultaneous alignment and folding of protein sequences

    Get PDF
    Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We presentpartiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm’s complexity is polynomial in time and space. Algorithmically,partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments,partiFold-Align significantly outperforms state-of-the-art pairwise sequence alignment tools in the most difficult low sequence homology case and improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families. partiFold-Align is available at http://partiFold.csail.mit.edu

    Fast RNA Structure Alignment for Crossing Input Structures

    Get PDF
    The complexity of pairwise RNA structure alignment depends on the structural restrictions assumed for both the input structures and the computed consensus structure. For arbitrarily crossing input and consensus structures, the problem is NP-hard. For non-crossing consensus structures, Jiang et al’s algorithm [1] computes the alignment in O(n2m2) time where n and m denote the lengths of the two input sequences. If the input structures are also non-crossing, the problem corresponds to tree editing which can be solved in O(m2n(1+log n)) time [2]. We present a new algorithm that solves the prob-m lem for d-crossing structures in O(dm2n log n) time, where d is a parameter that is one for non-crossing structures, bounded by n for crossing structures, and much smaller than n on many practical examples. Crossing input structures allow for applications where the input is not a fixed structure but is given as base-pair probability matrices
    corecore