10,025 research outputs found
Finding Even Subgraphs Even Faster
Problems of the following kind have been the focus of much recent research in
the realm of parameterized complexity: Given an input graph (digraph) on
vertices and a positive integer parameter , find if there exist edges
(arcs) whose deletion results in a graph that satisfies some specified parity
constraints. In particular, when the objective is to obtain a connected graph
in which all the vertices have even degrees---where the resulting graph is
\emph{Eulerian}---the problem is called Undirected Eulerian Edge Deletion. The
corresponding problem in digraphs where the resulting graph should be strongly
connected and every vertex should have the same in-degree as its out-degree is
called Directed Eulerian Edge Deletion. Cygan et al. [\emph{Algorithmica,
2014}] showed that these problems are fixed parameter tractable (FPT), and gave
algorithms with the running time . They also asked, as
an open problem, whether there exist FPT algorithms which solve these problems
in time . In this paper we answer their question in the
affirmative: using the technique of computing \emph{representative families of
co-graphic matroids} we design algorithms which solve these problems in time
. The crucial insight we bring to these problems is to view
the solution as an independent set of a co-graphic matroid. We believe that
this view-point/approach will be useful in other problems where one of the
constraints that need to be satisfied is that of connectivity
Significant Subgraph Mining with Multiple Testing Correction
The problem of finding itemsets that are statistically significantly enriched
in a class of transactions is complicated by the need to correct for multiple
hypothesis testing. Pruning untestable hypotheses was recently proposed as a
strategy for this task of significant itemset mining. It was shown to lead to
greater statistical power, the discovery of more truly significant itemsets,
than the standard Bonferroni correction on real-world datasets. An open
question, however, is whether this strategy of excluding untestable hypotheses
also leads to greater statistical power in subgraph mining, in which the number
of hypotheses is much larger than in itemset mining. Here we answer this
question by an empirical investigation on eight popular graph benchmark
datasets. We propose a new efficient search strategy, which always returns the
same solution as the state-of-the-art approach and is approximately two orders
of magnitude faster. Moreover, we exploit the dependence between subgraphs by
considering the effective number of tests and thereby further increase the
statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International
Conference on Data Mining (SDM15
Evolutionary accessibility of mutational pathways
Functional effects of different mutations are known to combine to the total
effect in highly nontrivial ways. For the trait under evolutionary selection
(`fitness'), measured values over all possible combinations of a set of
mutations yield a fitness landscape that determines which mutational states can
be reached from a given initial genotype. Understanding the accessibility
properties of fitness landscapes is conceptually important in answering
questions about the predictability and repeatability of evolutionary
adaptation. Here we theoretically investigate accessibility of the globally
optimal state on a wide variety of model landscapes, including landscapes with
tunable ruggedness as well as neutral `holey' landscapes. We define a
mutational pathway to be accessible if it contains the minimal number of
mutations required to reach the target genotype, and if fitness increases in
each mutational step. Under this definition accessibility is high, in the sense
that at least one accessible pathwayexists with a substantial probability that
approaches unity as the dimensionality of the fitness landscape (set by the
number of mutational loci) becomes large. At the same time the number of
alternative accessible pathways grows without bound. We test the model
predictions against an empirical 8-locus fitness landscape obtained for the
filamentous fungus \textit{Aspergillus niger}. By analyzing subgraphs of the
full landscape containing different subsets of mutations, we are able to probe
the mutational distance scale in the empirical data. The predicted effect of
high accessibility is supported by the empirical data and very robust, which we
argue to reflect the generic topology of sequence spaces.Comment: 16 pages, 4 figures; supplementary material available on reques
A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem
Many graph mining applications rely on detecting subgraphs which are
near-cliques. There exists a dichotomy between the results in the existing work
related to this problem: on the one hand the densest subgraph problem (DSP)
which maximizes the average degree over all subgraphs is solvable in polynomial
time but for many networks fails to find subgraphs which are near-cliques. On
the other hand, formulations that are geared towards finding near-cliques are
NP-hard and frequently inapproximable due to connections with the Maximum
Clique problem.
In this work, we propose a formulation which combines the best of both
worlds: it is solvable in polynomial time and finds near-cliques when the DSP
fails. Surprisingly, our formulation is a simple variation of the DSP.
Specifically, we define the triangle densest subgraph problem (TDSP): given
, find a subset of vertices such that , where is the number of triangles induced
by the set . We provide various exact and approximation algorithms which the
solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to
the more general problem of maximizing the -clique average density. Finally,
we provide empirical evidence that the TDSP should be used whenever the output
of the DSP fails to output a near-clique.Comment: 42 page
- …