Search CORE

10,171 research outputs found

Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications

Author: Alexandrov
Bentley
Bernardi
Bernardi
Charlesworth
Chung
Duret
Eyre-Walker
Eyre-Walker
Fields
Filipski
Francino
Fullerton
Greenberg
Guldberg
Hardison
Henke
Holmquist
Hsueh-I Lu
Huang
Ikehara
Inman
Jin
Kim
Lin
Macaya
Madsen
Michael H. Goldwasser
Ming-Yang Kao
Murata
Nekrutenko
Rice
Scotto
Sellers
Sharp
Soriano
Stojanovic
Sueoka
Wang
Wolfe
Wu
Zoubak
Publication venue: 'Elsevier BV'
Publication date: 04/11/2002
Field of study

We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a_i,w_i) for i = 1,..,n and w_i>0, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j) = sum_{i <= k <= j} w_k, and the density is (sum_{i<= k <= j} a_k)/ w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n \log L)-time algorithm by Lin, Jiang and Chao. When both L and U are specified, there are no previous nontrivial results. We solve the problem in O(n) time if w_i=1 for all i, and more generally in O(n+n\log(U-L+1)) time when w_i>=1 for all i.Comment: 23 pages, 13 figures. A significant portion of these results appeared under the title, "Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics," in Proceedings of the Second Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), R. Guigo and D. Gusfield editors, 2002, pp. 157--17

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

National Taiwan University Repository

Algorithms for the Problems of Length-Constrained Heaviest Segments

Author: Alam Md. Shafiul
Mukhopadhyay Asish
Publication venue
Publication date: 24/08/2011
Field of study

We present algorithms for length-constrained maximum sum segment and maximum density segment problems, in particular, and the problem of finding length-constrained heaviest segments, in general, for a sequence of real numbers. Given a sequence of n real numbers and two real parameters L and U (L <= U), the maximum sum segment problem is to find a consecutive subsequence, called a segment, of length at least L and at most U such that the sum of the numbers in the subsequence is maximum. The maximum density segment problem is to find a segment of length at least L and at most U such that the density of the numbers in the subsequence is the maximum. For the first problem with non-uniform width there is an algorithm with time and space complexities in O(n). We present an algorithm with time complexity in O(n) and space complexity in O(U). For the second problem with non-uniform width there is a combinatorial solution with time complexity in O(n) and space complexity in O(U). We present a simple geometric algorithm with the same time and space complexities. We extend our algorithms to respectively solve the length-constrained k maximum sum segments problem in O(n+k) time and O(max{U, k}) space, and the length-constrained

k

maximum density segments problem in O(n min{k, U-L}) time and O(U+k) space. We present extensions of our algorithms to find all the length-constrained segments having user specified sum and density in O(n+m) and O(nlog (U-L)+m) times respectively, where m is the number of output. Previously, there was no known algorithm with non-trivial result for these problems. We indicate the extensions of our algorithms to higher dimensions. All the algorithms can be extended in a straight forward way to solve the problems with non-uniform width and non-uniform weight.Comment: 21 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Locating regions in a sequence under density constraints

Author: Benjamin A. Burton
Boztaş S.
Greenberg R. I.
Huang X.
Lin Y.-L.
Mathias Hiron
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Queensland eSpace

An Optimal Algorithm for the Maximum-Density Segment Problem

Author: Holmquist G. P.
Hsueh-I Lu
Huang X.
Kai-min Chung
Scotto L.
Sueoka N.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 17/11/2003
Field of study

We address a fundamental problem arising from analysis of biomolecular sequences. The input consists of two numbers

w_{\min}

and

w_{\max}

and a sequence

S

n

number pairs

(a_i,w_i)

with

w_i>0

. Let {\em segment}

S(i,j)

S

be the consecutive subsequence of

S

between indices

i

and

j

. The {\em density} of

S(i,j)

d(i,j)=(a_i+a_{i+1}+...+a_j)/(w_i+w_{i+1}+...+w_j)

. The {\em maximum-density segment problem} is to find a maximum-density segment over all segments

S(i,j)

with

w_{\min}\leq w_i+w_{i+1}+...+w_j \leq w_{\max}

. The best previously known algorithm for the problem, due to Goldwasser, Kao, and Lu, runs in

O(n\log(w_{\max}-w_{\min}+1))

time. In the present paper, we solve the problem in O(n) time. Our approach bypasses the complicated {\em right-skew decomposition}, introduced by Lin, Jiang, and Chao. As a result, our algorithm has the capability to process the input sequence in an online manner, which is an important feature for dealing with genome-scale sequences. Moreover, for a type of input sequences

S

representable in

O(m)

space, we show how to exploit the sparsity of

S

and solve the maximum-density segment problem for

S

O(m)

time.Comment: 15 pages, 12 figures, an early version of this paper was presented at 11th Annual European Symposium on Algorithms (ESA 2003), Budapest, Hungary, September 15-20, 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Searching a bitstream in linear time for the longest substring of any given density

Author: Benjamin A. Burton
D.E. Knuth
D.R. Musser
G. Bernardi
G. Marsaglia
K. Chen
L. Duret
M.H. Goldwasser
P. Erdős
P.M. Sharp
R. Arratia
R. Hardison
R.I. Greenberg
S. Boztaş
S. Zoubak
S.M. Fullerton
T.H. Cormen
Y.-H. Hsieh
Y.-L. Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/06/2010
Field of study

Given an arbitrary bitstream, we consider the problem of finding the longest substring whose ratio of ones to zeroes equals a given value. The central result of this paper is an algorithm that solves this problem in linear time. The method involves (i) reformulating the problem as a constrained walk through a sparse matrix, and then (ii) developing a data structure for this sparse matrix that allows us to perform each step of the walk in amortised constant time. We also give a linear time algorithm to find the longest substring whose ratio of ones to zeroes is bounded below by a given value. Both problems have practical relevance to cryptography and bioinformatics.Comment: 22 pages, 19 figures; v2: minor edits and enhancement

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data

Author: James Nicholas A.
Matteson David S.
Publication venue
Publication date: 23/11/2013
Field of study

There are many different ways in which change point analysis can be performed, from purely parametric methods to those that are distribution free. The ecp package is designed to perform multiple change point analysis while making as few assumptions as possible. While many other change point methods are applicable only for univariate data, this R package is suitable for both univariate and multivariate observations. Estimation can be based upon either a hierarchical divisive or agglomerative algorithm. Divisive estimation sequentially identifies change points via a bisection algorithm. The agglomerative algorithm estimates change point locations by determining an optimal segmentation. Both approaches are able to detect any type of distributional change within the data. This provides an advantage over many existing change point algorithms which are only able to detect changes within the marginal distributions

arXiv.org e-Print Archive

CiteSeerX