Search CORE

182,095 research outputs found

Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications

Author: Alexandrov
Bentley
Bernardi
Bernardi
Charlesworth
Chung
Duret
Eyre-Walker
Eyre-Walker
Fields
Filipski
Francino
Fullerton
Greenberg
Guldberg
Hardison
Henke
Holmquist
Hsueh-I Lu
Huang
Ikehara
Inman
Jin
Kim
Lin
Macaya
Madsen
Michael H. Goldwasser
Ming-Yang Kao
Murata
Nekrutenko
Rice
Scotto
Sellers
Sharp
Soriano
Stojanovic
Sueoka
Wang
Wolfe
Wu
Zoubak
Publication venue: 'Elsevier BV'
Publication date: 04/11/2002
Field of study

We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a_i,w_i) for i = 1,..,n and w_i>0, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j) = sum_{i <= k <= j} w_k, and the density is (sum_{i<= k <= j} a_k)/ w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n \log L)-time algorithm by Lin, Jiang and Chao. When both L and U are specified, there are no previous nontrivial results. We solve the problem in O(n) time if w_i=1 for all i, and more generally in O(n+n\log(U-L+1)) time when w_i>=1 for all i.Comment: 23 pages, 13 figures. A significant portion of these results appeared under the title, "Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics," in Proceedings of the Second Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), R. Guigo and D. Gusfield editors, 2002, pp. 157--17

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

National Taiwan University Repository

Locating regions in a sequence under density constraints

Author: Benjamin A. Burton
Boztaş S.
Greenberg R. I.
Huang X.
Lin Y.-L.
Mathias Hiron
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Queensland eSpace

Where Graph Topology Matters: The Robust Subgraph Problem

Author: Akoglu Leman
Chan Hau
Han Shuchu
Publication venue
Publication date: 08/01/2015
Field of study

Robustness is a critical measure of the resilience of large networked systems, such as transportation and communication networks. Most prior works focus on the global robustness of a given graph at large, e.g., by measuring its overall vulnerability to external attacks or random failures. In this paper, we turn attention to local robustness and pose a novel problem in the lines of subgraph mining: given a large graph, how can we find its most robust local subgraph (RLS)? We define a robust subgraph as a subset of nodes with high communicability among them, and formulate the RLS-PROBLEM of finding a subgraph of given size with maximum robustness in the host graph. Our formulation is related to the recently proposed general framework for the densest subgraph problem, however differs from it substantially in that besides the number of edges in the subgraph, robustness also concerns with the placement of edges, i.e., the subgraph topology. We show that the RLS-PROBLEM is NP-hard and propose two heuristic algorithms based on top-down and bottom-up search strategies. Further, we present modifications of our algorithms to handle three practical variants of the RLS-PROBLEM. Experiments on synthetic and real-world graphs demonstrate that we find subgraphs with larger robustness than the densest subgraphs even at lower densities, suggesting that the existing approaches are not suitable for the new problem setting.Comment: 13 pages, 10 Figures, 3 Tables, to appear at SDM 2015 (9 pages only

arXiv.org e-Print Archive

CiteSeerX

Crossref