10,025 research outputs found

    Finding Even Subgraphs Even Faster

    Get PDF
    Problems of the following kind have been the focus of much recent research in the realm of parameterized complexity: Given an input graph (digraph) on nn vertices and a positive integer parameter kk, find if there exist kk edges (arcs) whose deletion results in a graph that satisfies some specified parity constraints. In particular, when the objective is to obtain a connected graph in which all the vertices have even degrees---where the resulting graph is \emph{Eulerian}---the problem is called Undirected Eulerian Edge Deletion. The corresponding problem in digraphs where the resulting graph should be strongly connected and every vertex should have the same in-degree as its out-degree is called Directed Eulerian Edge Deletion. Cygan et al. [\emph{Algorithmica, 2014}] showed that these problems are fixed parameter tractable (FPT), and gave algorithms with the running time 2O(klogk)nO(1)2^{O(k \log k)}n^{O(1)}. They also asked, as an open problem, whether there exist FPT algorithms which solve these problems in time 2O(k)nO(1)2^{O(k)}n^{O(1)}. In this paper we answer their question in the affirmative: using the technique of computing \emph{representative families of co-graphic matroids} we design algorithms which solve these problems in time 2O(k)nO(1)2^{O(k)}n^{O(1)}. The crucial insight we bring to these problems is to view the solution as an independent set of a co-graphic matroid. We believe that this view-point/approach will be useful in other problems where one of the constraints that need to be satisfied is that of connectivity

    Significant Subgraph Mining with Multiple Testing Correction

    Full text link
    The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International Conference on Data Mining (SDM15

    Evolutionary accessibility of mutational pathways

    Get PDF
    Functional effects of different mutations are known to combine to the total effect in highly nontrivial ways. For the trait under evolutionary selection (`fitness'), measured values over all possible combinations of a set of mutations yield a fitness landscape that determines which mutational states can be reached from a given initial genotype. Understanding the accessibility properties of fitness landscapes is conceptually important in answering questions about the predictability and repeatability of evolutionary adaptation. Here we theoretically investigate accessibility of the globally optimal state on a wide variety of model landscapes, including landscapes with tunable ruggedness as well as neutral `holey' landscapes. We define a mutational pathway to be accessible if it contains the minimal number of mutations required to reach the target genotype, and if fitness increases in each mutational step. Under this definition accessibility is high, in the sense that at least one accessible pathwayexists with a substantial probability that approaches unity as the dimensionality of the fitness landscape (set by the number of mutational loci) becomes large. At the same time the number of alternative accessible pathways grows without bound. We test the model predictions against an empirical 8-locus fitness landscape obtained for the filamentous fungus \textit{Aspergillus niger}. By analyzing subgraphs of the full landscape containing different subsets of mutations, we are able to probe the mutational distance scale in the empirical data. The predicted effect of high accessibility is supported by the empirical data and very robust, which we argue to reflect the generic topology of sequence spaces.Comment: 16 pages, 4 figures; supplementary material available on reques

    A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem

    Full text link
    Many graph mining applications rely on detecting subgraphs which are near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand the densest subgraph problem (DSP) which maximizes the average degree over all subgraphs is solvable in polynomial time but for many networks fails to find subgraphs which are near-cliques. On the other hand, formulations that are geared towards finding near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and finds near-cliques when the DSP fails. Surprisingly, our formulation is a simple variation of the DSP. Specifically, we define the triangle densest subgraph problem (TDSP): given G(V,E)G(V,E), find a subset of vertices SS^* such that τ(S)=maxSVt(S)S\tau(S^*)=\max_{S \subseteq V} \frac{t(S)}{|S|}, where t(S)t(S) is the number of triangles induced by the set SS. We provide various exact and approximation algorithms which the solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the kk-clique average density. Finally, we provide empirical evidence that the TDSP should be used whenever the output of the DSP fails to output a near-clique.Comment: 42 page
    corecore