127 research outputs found
Finding and counting vertex-colored subtrees
The problems studied in this article originate from the Graph Motif problem
introduced by Lacroix et al. in the context of biological networks. The problem
is to decide if a vertex-colored graph has a connected subgraph whose colors
equal a given multiset of colors . It is a graph pattern-matching problem
variant, where the structure of the occurrence of the pattern is not of
interest but the only requirement is the connectedness. Using an algebraic
framework recently introduced by Koutis et al., we obtain new FPT algorithms
for Graph Motif and variants, with improved running times. We also obtain
results on the counting versions of this problem, proving that the counting
problem is FPT if M is a set, but becomes W[1]-hard if M is a multiset with two
colors. Finally, we present an experimental evaluation of this approach on real
datasets, showing that its performance compares favorably with existing
software.Comment: Conference version in International Symposium on Mathematical
Foundations of Computer Science (MFCS), Brno : Czech Republic (2010) Journal
Version in Algorithmic
Motif counting beyond five nodes
Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets
Heuristic Algorithms for the Maximum Colorful Subtree Problem
In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem - and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured.
Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic.
We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100-fold
Graph Motif Problems Parameterized by Dual
Let G=(V,E) be a vertex-colored graph, where C is the set of colors used to color V. The Graph Motif (or GM) problem takes as input G, a multiset M of colors built from C, and asks whether there is a subset S subseteq V such that (i) G[S] is connected and (ii) the multiset of colors obtained from S equals M. The Colorful Graph Motif problem (or CGM) is a constrained version of GM in which M=C, and the List-Colored Graph Motif problem (or LGM) is the extension of GM in which each vertex v of V may choose its color from a list L(v) of colors.
We study the three problems GM, CGM and LGM, parameterized by l:=|V|-|M|. In particular, for general graphs, we show that, assuming the strong exponential-time hypothesis, CGM has no (2-epsilon)^l * |V|^{O(1)}-time algorithm, which implies that a previous algorithm, running in O(2^lcdot |E|) time is optimal. We also prove that LGM is W[1]-hard even if we restrict ourselves to lists of at most two colors. If we constrain the input graph to be a tree, then we show that, in contrast to CGM, GM can be solved in O(4^l *|V|) time but admits no polynomial kernel, while CGM can be solved in O(sqrt{2}^l + |V|) time and admits a polynomial kernel
Overlap-free Drawing of Generalized Pythagoras Trees for Hierarchy Visualization
Generalized Pythagoras trees were developed for visualizing hierarchical
data, producing organic, fractal-like representations. However, the drawback of
the original layout algorithm is visual overlap of tree branches. To avoid such
overlap, we introduce an adapted drawing algorithm using ellipses instead of
circles to recursively place tree nodes representing the subhierarchies. Our
technique is demonstrated by resolving overlap in diverse real-world and
generated datasets, while comparing the results to the original approach
- …