145 research outputs found

    Closest string with outliers

    Get PDF
    Background: Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings is an important task that plays a role in many applications in bioinformatics. Results: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the CLOSEST STRING WITH OUTLIERS (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n – k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set. We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n – k, ℓ, and d. Conclusions: Our refined model abstractly models finding common patterns in several but not all input strings

    The Graph Motif problem parameterized by the structure of the input graph

    Full text link
    The Graph Motif problem was introduced in 2006 in the context of biological networks. It consists of deciding whether or not a multiset of colors occurs in a connected subgraph of a vertex-colored graph. Graph Motif has been mostly analyzed from the standpoint of parameterized complexity. The main parameters which came into consideration were the size of the multiset and the number of colors. Though, in the many applications of Graph Motif, the input graph originates from real-life and has structure. Motivated by this prosaic observation, we systematically study its complexity relatively to graph structural parameters. For a wide range of parameters, we give new or improved FPT algorithms, or show that the problem remains intractable. For the FPT cases, we also give some kernelization lower bounds as well as some ETH-based lower bounds on the worst case running time. Interestingly, we establish that Graph Motif is W[1]-hard (while in W[P]) for parameter max leaf number, which is, to the best of our knowledge, the first problem to behave this way.Comment: 24 pages, accepted in DAM, conference version in IPEC 201

    SLIDER: Mining correlated motifs in protein-protein interaction networks

    Get PDF
    Abstract—Correlated motif mining (CMM) is the problem to find overrepresented pairs of patterns, called motif pairs, in interacting protein sequences. Algorithmic solutions for CMM thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that CMM is an NP-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the method SLIDER which uses local search with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that SLIDER outperforms existing motif-driven CMM methods and scales to large protein-protein interaction networks

    Graph Motif Problems Parameterized by Dual

    Get PDF
    Let G=(V,E) be a vertex-colored graph, where C is the set of colors used to color V. The Graph Motif (or GM) problem takes as input G, a multiset M of colors built from C, and asks whether there is a subset S subseteq V such that (i) G[S] is connected and (ii) the multiset of colors obtained from S equals M. The Colorful Graph Motif problem (or CGM) is a constrained version of GM in which M=C, and the List-Colored Graph Motif problem (or LGM) is the extension of GM in which each vertex v of V may choose its color from a list L(v) of colors. We study the three problems GM, CGM and LGM, parameterized by l:=|V|-|M|. In particular, for general graphs, we show that, assuming the strong exponential-time hypothesis, CGM has no (2-epsilon)^l * |V|^{O(1)}-time algorithm, which implies that a previous algorithm, running in O(2^lcdot |E|) time is optimal. We also prove that LGM is W[1]-hard even if we restrict ourselves to lists of at most two colors. If we constrain the input graph to be a tree, then we show that, in contrast to CGM, GM can be solved in O(4^l *|V|) time but admits no polynomial kernel, while CGM can be solved in O(sqrt{2}^l + |V|) time and admits a polynomial kernel

    Parameterized (Modular) Counting and Cayley Graph Expanders

    Get PDF
    We study the problem #EdgeSub(?) of counting k-edge subgraphs satisfying a given graph property ? in a large host graph G. Building upon the breakthrough result of Curticapean, Dell and Marx (STOC 17), we express the number of such subgraphs as a finite linear combination of graph homomorphism counts and derive the complexity of computing this number by studying its coefficients. Our approach relies on novel constructions of low-degree Cayley graph expanders of p-groups, which might be of independent interest. The properties of those expanders allow us to analyse the coefficients in the aforementioned linear combinations over the field ?_p which gives us significantly more control over the cancellation behaviour of the coefficients. Our main result is an exhaustive and fine-grained complexity classification of #EdgeSub(?) for minor-closed properties ?, closing the missing gap in previous work by Roth, Schmitt and Wellnitz (ICALP 21). Additionally, we observe that our methods also apply to modular counting. Among others, we obtain novel intractability results for the problems of counting k-forests and matroid bases modulo a prime p. Furthermore, from an algorithmic point of view, we construct algorithms for the problems of counting k-paths and k-cycles modulo 2 that outperform the best known algorithms for their non-modular counterparts. In the course of our investigations we also provide an exhaustive parameterized complexity classification for the problem of counting graph homomorphisms modulo a prime p

    Kernelization and Sparseness: the case of Dominating Set

    Get PDF
    We prove that for every positive integer rr and for every graph class G\mathcal G of bounded expansion, the rr-Dominating Set problem admits a linear kernel on graphs from G\mathcal G. Moreover, when G\mathcal G is only assumed to be nowhere dense, then we give an almost linear kernel on G\mathcal G for the classic Dominating Set problem, i.e., for the case r=1r=1. These results generalize a line of previous research on finding linear kernels for Dominating Set and rr-Dominating Set. However, the approach taken in this work, which is based on the theory of sparse graphs, is radically different and conceptually much simpler than the previous approaches. We complement our findings by showing that for the closely related Connected Dominating Set problem, the existence of such kernelization algorithms is unlikely, even though the problem is known to admit a linear kernel on HH-topological-minor-free graphs. Also, we prove that for any somewhere dense class G\mathcal G, there is some rr for which rr-Dominating Set is W[22]-hard on G\mathcal G. Thus, our results fall short of proving a sharp dichotomy for the parameterized complexity of rr-Dominating Set on subgraph-monotone graph classes: we conjecture that the border of tractability lies exactly between nowhere dense and somewhere dense graph classes.Comment: v2: new author, added results for r-Dominating Sets in bounded expansion graph

    Parameterized algorithms and hardness results for some graph motif problems

    Get PDF
    Abstract. We study the NP-complete Graph Motif problem: given a vertex-colored graph G = (V, E) and a multiset M of colors, does there exist an S ⊆ V such that G[S] is connected and carries exactly (also with respect to multiplicity) the colors in M ? We present an improved randomized algorithm for Graph Motif with running time O(4.32 . We extend our algorithm to list-colored graph vertices and the case where the motif G[S] needs not be connected. By way of contrast, we show that extending the request for motif connectedness to the somewhat "more robust" motif demands of biconnectedness or bridgeconnectedness leads to W[1]-complete problems. Actually, we show that the even simpler problems of finding biconnected or bridge-connected subgraphs are W[1]-complete with respect to the subgraph size. Answering an open question from the literature, we further show that the parameter number of connected motif components leads to W[1]-hardness even when restricted to the very special case of graphs that are paths

    The Graph Motif Problem Parameterized by the Structure of the Input Graph

    Get PDF
    The Graph Motif problem was introduced in 2006 in the context of biological networks. It consists of deciding whether or not a multiset of colors occurs in a connected subgraph of a vertex-colored graph. Graph Motif has been analyzed from the standpoint of parameterized complexity. The main parameters which came into consideration were the size of the multiset and the number of colors. Though, in the many applications of Graph Motif, the input graph originates from real-life and has structure. Motivated by this prosaic observation, we systematically study its complexity relatively to graph structural parameters. For a wide range of parameters, we give new or improved FPT algorithms, or show that the problem remains intractable. Interestingly, we establish that Graph Motif is W[1]-hard (while in W[P]) for parameter max leaf number, which is, to the best of our knowledge, the first problem to behave this way
    • …