thesis

Algorithmic Aspects of Some Problems in Computational Biology

Abstract

Given a sequence of pairs of numbers ( a i , l i ), i = 1, 2, ..., n , with l i \u3e 0, and another pair of numbers L and U , the length-constrained maximum density segment problem is to find a subsequence [ a i , a j ]whose density is the maximum under the constraint L ≤ [Special characters omitted.] Sjs=i l s ≤ U . It has application to DNA sequence analysis in Computational Biology, particularly in the determination of the percentage of CG contents in a DNA sequence. A linear time geometric algorithm is presented that is more powerful than the existing linear time algorithms. The method is extended to solve the k length-constrained maximum density segments problem. Previously, there was no known algorithm with non-trivial time complexity for this problem. We present a linear time algorithm to solve the length-constrained maximum sum segment problem. It is extended to solve the k length-constrained maximum sum segments problem in O(n+k) time. The algorithms are extended to solve the problem of finding all the length-constrained segments satisfying user specified sum or density lower bound in O(n+h) time, where h is the size of the output. The point placement problem is to determine the positions of a linear set of points uniquely up to translation and reflection from the fewest possible distance queries between pairs of points. The motivation comes from a problem known as the restriction site mapping. We present 2-round algorithms with queries 10 n /7+ O (1), 4 n /3+ O (1) and 9 n /7+ O (1) respectively. The lower bound for 2 rounds is improved from 17 n /16 to 9 n /8. We also present a modification of a geometric method called MSPocket for detection of ligand binding sites on protein surfaces. Experimentation using 48 benchmark dataset of bound protein structures shows that the success rate of our method is slightly better than that of MSPocket. (Abstract shortened by UMI.

    Similar works