25 research outputs found

    Solving Globally-Optimal Threading Problems in ''Polynomial-Time''

    No full text
    Computational protein threading is a powerful technique for recognizing native-like folds of a protein sequence from a protein fold database. In this paper, we present an improved algorithm (over our previous work) for solving the globally-optimal threading problem, and illustrate how the computational complexity and the fold recognition accuracy of the algorithm change as the cutoff distance for pairwise interactions changes. For a given fold of m residues and M core secondary structures (or simply cores) and a protein sequence of n residues, the algorithm guarantees to find a sequence-fold alignment (threading) that is globally optimal, measured collectively by (1) the singleton match fitness, (2) pairwise interaction preference, and (3) alignment gap penalties, in O(mn + MnN{sup 1.5C-1}) time and O(mn + nN{sup C-1}) space. C, the topological complexity of a fold as we term, is a value which characterizes the overall structure of the considered pairwise interactions in the fold, which are typically determined by a specified cutoff distance between the beta carbon atoms of a pair of amino acids in the fold. C is typically a small positive integer. N represents the maximum number of possible alignments between an individual core of the fold and the protein sequence when its neighboring cores are already aligned, and its value is significantly less than n. When interacting amino acids are required to see each other, C is bounded from above by a small integer no matter how large the cutoff distance is. This indicates that the protein threading problem is polynomial-time solvable if the condition of seeing each other between interacting amino acids is sufficient for accurate fold recognition. A number of extensions have been made to our basic threading algorithm to allow finding a globally-optimal threading under various constraints, which include consistencies with (1) specified secondary structures (both cores and loops), (2) disulfide bonds, (3) active sites, etc
    corecore