1,102 research outputs found
Axioms for graph clustering quality functions
We investigate properties that intuitively ought to be satisfied by graph
clustering quality functions, that is, functions that assign a score to a
clustering of a graph. Graph clustering, also known as network community
detection, is often performed by optimizing such a function. Two axioms
tailored for graph clustering quality functions are introduced, and the four
axioms introduced in previous work on distance based clustering are
reformulated and generalized for the graph setting. We show that modularity, a
standard quality function for graph clustering, does not satisfy all of these
six properties. This motivates the derivation of a new family of quality
functions, adaptive scale modularity, which does satisfy the proposed axioms.
Adaptive scale modularity has two parameters, which give greater flexibility in
the kinds of clusterings that can be found. Standard graph clustering quality
functions, such as normalized cut and unnormalized cut, are obtained as special
cases of adaptive scale modularity.
In general, the results of our investigation indicate that the considered
axiomatic framework covers existing `good' quality functions for graph
clustering, and can be used to derive an interesting new family of quality
functions.Comment: 23 pages. Full text and sources available on:
http://www.cs.ru.nl/~T.vanLaarhoven/graph-clustering-axioms-2014
Violating the Shannon capacity of metric graphs with entanglement
The Shannon capacity of a graph G is the maximum asymptotic rate at which
messages can be sent with zero probability of error through a noisy channel
with confusability graph G. This extensively studied graph parameter disregards
the fact that on atomic scales, Nature behaves in line with quantum mechanics.
Entanglement, arguably the most counterintuitive feature of the theory, turns
out to be a useful resource for communication across noisy channels. Recently,
Leung, Mancinska, Matthews, Ozols and Roy [Comm. Math. Phys. 311, 2012]
presented two examples of graphs whose Shannon capacity is strictly less than
the capacity attainable if the sender and receiver have entangled quantum
systems. Here we give new, possibly infinite, families of graphs for which the
entangled capacity exceeds the Shannon capacity.Comment: 15 pages, 2 figure
Assessing the Computational Complexity of Multi-Layer Subgraph Detection
Multi-layer graphs consist of several graphs (layers) over the same vertex
set. They are motivated by real-world problems where entities (vertices) are
associated via multiple types of relationships (edges in different layers). We
chart the border of computational (in)tractability for the class of subgraph
detection problems on multi-layer graphs, including fundamental problems such
as maximum matching, finding certain clique relaxations (motivated by community
detection), or path problems. Mostly encountering hardness results, sometimes
even for two or three layers, we can also spot some islands of tractability
A Tutorial on Clique Problems in Communications and Signal Processing
Since its first use by Euler on the problem of the seven bridges of
K\"onigsberg, graph theory has shown excellent abilities in solving and
unveiling the properties of multiple discrete optimization problems. The study
of the structure of some integer programs reveals equivalence with graph theory
problems making a large body of the literature readily available for solving
and characterizing the complexity of these problems. This tutorial presents a
framework for utilizing a particular graph theory problem, known as the clique
problem, for solving communications and signal processing problems. In
particular, the paper aims to illustrate the structural properties of integer
programs that can be formulated as clique problems through multiple examples in
communications and signal processing. To that end, the first part of the
tutorial provides various optimal and heuristic solutions for the maximum
clique, maximum weight clique, and -clique problems. The tutorial, further,
illustrates the use of the clique formulation through numerous contemporary
examples in communications and signal processing, mainly in maximum access for
non-orthogonal multiple access networks, throughput maximization using index
and instantly decodable network coding, collision-free radio frequency
identification networks, and resource allocation in cloud-radio access
networks. Finally, the tutorial sheds light on the recent advances of such
applications, and provides technical insights on ways of dealing with mixed
discrete-continuous optimization problems
Optimality Clue for Graph Coloring Problem
In this paper, we present a new approach which qualifies or not a solution
found by a heuristic as a potential optimal solution. Our approach is based on
the following observation: for a minimization problem, the number of admissible
solutions decreases with the value of the objective function. For the Graph
Coloring Problem (GCP), we confirm this observation and present a new way to
prove optimality. This proof is based on the counting of the number of
different k-colorings and the number of independent sets of a given graph G.
Exact solutions counting problems are difficult problems (\#P-complete).
However, we show that, using only randomized heuristics, it is possible to
define an estimation of the upper bound of the number of k-colorings. This
estimate has been calibrated on a large benchmark of graph instances for which
the exact number of optimal k-colorings is known. Our approach, called
optimality clue, build a sample of k-colorings of a given graph by running many
times one randomized heuristic on the same graph instance. We use the
evolutionary algorithm HEAD [Moalic et Gondran, 2018], which is one of the most
efficient heuristic for GCP. Optimality clue matches with the standard
definition of optimality on a wide number of instances of DIMACS and RBCII
benchmarks where the optimality is known. Then, we show the clue of optimality
for another set of graph instances. Optimality Metaheuristics Near-optimal
Algorithmic and enumerative aspects of the Moser-Tardos distribution
Moser & Tardos have developed a powerful algorithmic approach (henceforth
"MT") to the Lovasz Local Lemma (LLL); the basic operation done in MT and its
variants is a search for "bad" events in a current configuration. In the
initial stage of MT, the variables are set independently. We examine the
distributions on these variables which arise during intermediate stages of MT.
We show that these configurations have a more or less "random" form, building
further on the "MT-distribution" concept of Haeupler et al. in understanding
the (intermediate and) output distribution of MT. This has a variety of
algorithmic applications; the most important is that bad events can be found
relatively quickly, improving upon MT across the complexity spectrum: it makes
some polynomial-time algorithms sub-linear (e.g., for Latin transversals, which
are of basic combinatorial interest), gives lower-degree polynomial run-times
in some settings, transforms certain super-polynomial-time algorithms into
polynomial-time ones, and leads to Las Vegas algorithms for some coloring
problems for which only Monte Carlo algorithms were known.
We show that in certain conditions when the LLL condition is violated, a
variant of the MT algorithm can still produce a distribution which avoids most
of the bad events. We show in some cases this MT variant can run faster than
the original MT algorithm itself, and develop the first-known criterion for the
case of the asymmetric LLL. This can be used to find partial Latin transversals
-- improving upon earlier bounds of Stein (1975) -- among other applications.
We furthermore give applications in enumeration, showing that most applications
(where we aim for all or most of the bad events to be avoided) have many more
solutions than known before by proving that the MT-distribution has "large"
min-entropy and hence that its support-size is large
Two extensions of Ramsey's theorem
Ramsey's theorem, in the version of Erd\H{o}s and Szekeres, states that every
2-coloring of the edges of the complete graph on {1, 2,...,n} contains a
monochromatic clique of order 1/2\log n. In this paper, we consider two
well-studied extensions of Ramsey's theorem.
Improving a result of R\"odl, we show that there is a constant such
that every 2-coloring of the edges of the complete graph on \{2, 3,...,n\}
contains a monochromatic clique S for which the sum of 1/\log i over all
vertices i \in S is at least c\log\log\log n. This is tight up to the constant
factor c and answers a question of Erd\H{o}s from 1981.
Motivated by a problem in model theory, V\"a\"an\"anen asked whether for
every k there is an n such that the following holds. For every permutation \pi
of 1,...,k-1, every 2-coloring of the edges of the complete graph on {1, 2,
..., n} contains a monochromatic clique a_1<...<a_k with
a_{\pi(1)+1}-a_{\pi(1)}>a_{\pi(2)+1}-a_{\pi(2)}>...>a_{\pi(k-1)+1}-a_{\pi(k-1)}.
That is, not only do we want a monochromatic clique, but the differences
between consecutive vertices must satisfy a prescribed order. Alon and,
independently, Erd\H{o}s, Hajnal and Pach answered this question affirmatively.
Alon further conjectured that the true growth rate should be exponential in k.
We make progress towards this conjecture, obtaining an upper bound on n which
is exponential in a power of k. This improves a result of Shelah, who showed
that n is at most double-exponential in k.Comment: 21 pages, accepted for publication in Duke Math.
Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering
Graph clustering, or community detection, is the task of identifying groups
of closely related objects in a large network. In this paper we introduce a new
community-detection framework called LambdaCC that is based on a specially
weighted version of correlation clustering. A key component in our methodology
is a clustering resolution parameter, , which implicitly controls the
size and structure of clusters formed by our framework. We show that, by
increasing this parameter, our objective effectively interpolates between two
different strategies in graph clustering: finding a sparse cut and forming
dense subgraphs. Our methodology unifies and generalizes a number of other
important clustering quality functions including modularity, sparsest cut, and
cluster deletion, and places them all within the context of an optimization
problem that has been well studied from the perspective of approximation
algorithms. Our approach is particularly relevant in the regime of finding
dense clusters, as it leads to a 2-approximation for the cluster deletion
problem. We use our approach to cluster several graphs, including large
collaboration networks and social networks
High-dimensional learning of linear causal networks via inverse covariance estimation
We establish a new framework for statistical estimation of directed acyclic
graphs (DAGs) when data are generated from a linear, possibly non-Gaussian
structural equation model. Our framework consists of two parts: (1) inferring
the moralized graph from the support of the inverse covariance matrix; and (2)
selecting the best-scoring graph amongst DAGs that are consistent with the
moralized graph. We show that when the error variances are known or estimated
to close enough precision, the true DAG is the unique minimizer of the score
computed using the reweighted squared l_2-loss. Our population-level results
have implications for the identifiability of linear SEMs when the error
covariances are specified up to a constant multiple. On the statistical side,
we establish rigorous conditions for high-dimensional consistency of our
two-part algorithm, defined in terms of a "gap" between the true DAG and the
next best candidate. Finally, we demonstrate that dynamic programming may be
used to select the optimal DAG in linear time when the treewidth of the
moralized graph is bounded.Comment: 41 pages, 7 figure
- …