182,095 research outputs found
Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications
We study an abstract optimization problem arising from biomolecular sequence
analysis. For a sequence A of pairs (a_i,w_i) for i = 1,..,n and w_i>0, a
segment A(i,j) is a consecutive subsequence of A starting with index i and
ending with index j. The width of A(i,j) is w(i,j) = sum_{i <= k <= j} w_k, and
the density is (sum_{i<= k <= j} a_k)/ w(i,j). The maximum-density segment
problem takes A and two values L and U as input and asks for a segment of A
with the largest possible density among those of width at least L and at most
U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm,
improving upon the O(n \log L)-time algorithm by Lin, Jiang and Chao. When both
L and U are specified, there are no previous nontrivial results. We solve the
problem in O(n) time if w_i=1 for all i, and more generally in
O(n+n\log(U-L+1)) time when w_i>=1 for all i.Comment: 23 pages, 13 figures. A significant portion of these results appeared
under the title, "Fast Algorithms for Finding Maximum-Density Segments of a
Sequence with Applications to Bioinformatics," in Proceedings of the Second
Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes
in Computer Science (Springer-Verlag, Berlin), R. Guigo and D. Gusfield
editors, 2002, pp. 157--17
Locating regions in a sequence under density constraints
Several biological problems require the identification of regions in a
sequence where some feature occurs within a target density range: examples
including the location of GC-rich regions, identification of CpG islands, and
sequence matching. Mathematically, this corresponds to searching a string of 0s
and 1s for a substring whose relative proportion of 1s lies between given lower
and upper bounds. We consider the algorithmic problem of locating the longest
such substring, as well as other related problems (such as finding the shortest
substring or a maximal set of disjoint substrings). For locating the longest
such substring, we develop an algorithm that runs in O(n) time, improving upon
the previous best-known O(n log n) result. For the related problems we develop
O(n log log n) algorithms, again improving upon the best-known O(n log n)
results. Practical testing verifies that our new algorithms enjoy significantly
smaller time and memory footprints, and can process sequences that are orders
of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to
appear in SIAM Journal on Computin
Where Graph Topology Matters: The Robust Subgraph Problem
Robustness is a critical measure of the resilience of large networked
systems, such as transportation and communication networks. Most prior works
focus on the global robustness of a given graph at large, e.g., by measuring
its overall vulnerability to external attacks or random failures. In this
paper, we turn attention to local robustness and pose a novel problem in the
lines of subgraph mining: given a large graph, how can we find its most robust
local subgraph (RLS)?
We define a robust subgraph as a subset of nodes with high communicability
among them, and formulate the RLS-PROBLEM of finding a subgraph of given size
with maximum robustness in the host graph. Our formulation is related to the
recently proposed general framework for the densest subgraph problem, however
differs from it substantially in that besides the number of edges in the
subgraph, robustness also concerns with the placement of edges, i.e., the
subgraph topology. We show that the RLS-PROBLEM is NP-hard and propose two
heuristic algorithms based on top-down and bottom-up search strategies.
Further, we present modifications of our algorithms to handle three practical
variants of the RLS-PROBLEM. Experiments on synthetic and real-world graphs
demonstrate that we find subgraphs with larger robustness than the densest
subgraphs even at lower densities, suggesting that the existing approaches are
not suitable for the new problem setting.Comment: 13 pages, 10 Figures, 3 Tables, to appear at SDM 2015 (9 pages only
- …