Search CORE

1 research outputs found

Quasiperiodic biosequences and modulo incidence matrices

Author: Enmin Song
Publication venue
Publication date: 01/01/2002
Field of study

Algorithm development for finding quasiperiodic regions in sequences is at the core of many problems arising in biological sequence analysis. We solve an important problem in this area. Let A be an alphabet of size n and A ’ denote the set of sequences of length 1 over A. Given a sequence S = ~1.52...sl E A’, a positive integer p is called a period of S if s; = s;+ ~ for 1 5 i 5 1- p. S is called p-periodic if it has a minimum period p. Let n,(p) denote the set of p-periodic sequences in A I. A natural measure of “nearness to p-periodicity” for S is the average Hamming distance to the nearest p-periodic sequence: D(S) = minTEal(plD(S,T). If T is a sequence E n,(p) such that D(S,T) = D(S), then T is called a nearest p-periodic sequence of S and S is called p-quasiperiodic associated with the score D(S). This paper develops an efficient algorithm for finding a nearest p-periodic sequence of S by means of its modulo-p incidence matrix. Let c\ / = (crr;..,c\/,) and /? = (q+ l;..,q+l 4 where 1 = CV ~ + CV ~ +... + CV, is a partition of 1 and 4 is the quotientPaLd r is the remainder when 1 is divided by p. This paper shows that there exists a sequence in A ’ whose modulo-p incidence matrix has row sum vector c\ / and column sum vector 0

CiteSeerX