127 research outputs found

    Finding and counting vertex-colored subtrees

    Full text link
    The problems studied in this article originate from the Graph Motif problem introduced by Lacroix et al. in the context of biological networks. The problem is to decide if a vertex-colored graph has a connected subgraph whose colors equal a given multiset of colors MM. It is a graph pattern-matching problem variant, where the structure of the occurrence of the pattern is not of interest but the only requirement is the connectedness. Using an algebraic framework recently introduced by Koutis et al., we obtain new FPT algorithms for Graph Motif and variants, with improved running times. We also obtain results on the counting versions of this problem, proving that the counting problem is FPT if M is a set, but becomes W[1]-hard if M is a multiset with two colors. Finally, we present an experimental evaluation of this approach on real datasets, showing that its performance compares favorably with existing software.Comment: Conference version in International Symposium on Mathematical Foundations of Computer Science (MFCS), Brno : Czech Republic (2010) Journal Version in Algorithmic

    Motif counting beyond five nodes

    Get PDF
    Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets

    Heuristic Algorithms for the Maximum Colorful Subtree Problem

    Get PDF
    In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem - and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured. Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic. We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100-fold

    Graph Motif Problems Parameterized by Dual

    Get PDF
    Let G=(V,E) be a vertex-colored graph, where C is the set of colors used to color V. The Graph Motif (or GM) problem takes as input G, a multiset M of colors built from C, and asks whether there is a subset S subseteq V such that (i) G[S] is connected and (ii) the multiset of colors obtained from S equals M. The Colorful Graph Motif problem (or CGM) is a constrained version of GM in which M=C, and the List-Colored Graph Motif problem (or LGM) is the extension of GM in which each vertex v of V may choose its color from a list L(v) of colors. We study the three problems GM, CGM and LGM, parameterized by l:=|V|-|M|. In particular, for general graphs, we show that, assuming the strong exponential-time hypothesis, CGM has no (2-epsilon)^l * |V|^{O(1)}-time algorithm, which implies that a previous algorithm, running in O(2^lcdot |E|) time is optimal. We also prove that LGM is W[1]-hard even if we restrict ourselves to lists of at most two colors. If we constrain the input graph to be a tree, then we show that, in contrast to CGM, GM can be solved in O(4^l *|V|) time but admits no polynomial kernel, while CGM can be solved in O(sqrt{2}^l + |V|) time and admits a polynomial kernel

    Overlap-free Drawing of Generalized Pythagoras Trees for Hierarchy Visualization

    Full text link
    Generalized Pythagoras trees were developed for visualizing hierarchical data, producing organic, fractal-like representations. However, the drawback of the original layout algorithm is visual overlap of tree branches. To avoid such overlap, we introduce an adapted drawing algorithm using ellipses instead of circles to recursively place tree nodes representing the subhierarchies. Our technique is demonstrated by resolving overlap in diverse real-world and generated datasets, while comparing the results to the original approach
    • …
    corecore