767 research outputs found
Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
We propose a fast, parallel maximum clique algorithm for large sparse graphs
that is designed to exploit characteristics of social and information networks.
The method exhibits a roughly linear runtime scaling over real-world networks
ranging from 1000 to 100 million nodes. In a test on a social network with 1.8
billion edges, the algorithm finds the largest clique in about 20 minutes. Our
method employs a branch and bound strategy with novel and aggressive pruning
techniques. For instance, we use the core number of a vertex in combination
with a good heuristic clique finder to efficiently remove the vast majority of
the search space. In addition, we parallelize the exploration of the search
tree. During the search, processes immediately communicate changes to upper and
lower bounds on the size of maximum clique, which occasionally results in a
super-linear speedup because vertices with large search spaces can be pruned by
other processes. We apply the algorithm to two problems: to compute temporal
strong components and to compress graphs.Comment: 11 page
Clique versus Independent Set
Yannakakis' Clique versus Independent Set problem (CL-IS) in communication
complexity asks for the minimum number of cuts separating cliques from stable
sets in a graph, called CS-separator. Yannakakis provides a quasi-polynomial
CS-separator, i.e. of size , and addresses the problem of
finding a polynomial CS-separator. This question is still open even for perfect
graphs. We show that a polynomial CS-separator almost surely exists for random
graphs. Besides, if H is a split graph (i.e. has a vertex-partition into a
clique and a stable set) then there exists a constant for which we find a
CS-separator on the class of H-free graphs. This generalizes a
result of Yannakakis on comparability graphs. We also provide a
CS-separator on the class of graphs without induced path of length k and its
complement. Observe that on one side, is of order
resulting from Vapnik-Chervonenkis dimension, and on the other side, is
exponential.
One of the main reason why Yannakakis' CL-IS problem is fascinating is that
it admits equivalent formulations. Our main result in this respect is to show
that a polynomial CS-separator is equivalent to the polynomial
Alon-Saks-Seymour Conjecture, asserting that if a graph has an edge-partition
into k complete bipartite graphs, then its chromatic number is polynomially
bounded in terms of k. We also show that the classical approach to the stubborn
problem (arising in CSP) which consists in covering the set of all solutions by
instances of 2-SAT is again equivalent to the existence of a
polynomial CS-separator
Coloring in the Congested Clique Model
In this paper, we present improved algorithms for the (vertex)
coloring problem in the Congested-Clique model of distributed computing. In
this model, the input is a graph on nodes, initially each node knows only
its incident edges, and per round each two nodes can exchange bits
of information.
Our key result is a randomized vertex coloring algorithm that
works in -rounds. This is achieved by
combining the recent breakthrough result of [Chang-Li-Pettie, STOC'18] in the
\local\ model and a degree reduction technique. We also get the following
results with high probability: (1) -coloring for for any , within
rounds, and (2)
-coloring within rounds. Turning
to deterministic algorithms, we show a -coloring algorithm that
works in rounds.Comment: Appeared in ICALP'18 (the update version adds a missing part in the
deterministic coloring procedure
Algorithmic and enumerative aspects of the Moser-Tardos distribution
Moser & Tardos have developed a powerful algorithmic approach (henceforth
"MT") to the Lovasz Local Lemma (LLL); the basic operation done in MT and its
variants is a search for "bad" events in a current configuration. In the
initial stage of MT, the variables are set independently. We examine the
distributions on these variables which arise during intermediate stages of MT.
We show that these configurations have a more or less "random" form, building
further on the "MT-distribution" concept of Haeupler et al. in understanding
the (intermediate and) output distribution of MT. This has a variety of
algorithmic applications; the most important is that bad events can be found
relatively quickly, improving upon MT across the complexity spectrum: it makes
some polynomial-time algorithms sub-linear (e.g., for Latin transversals, which
are of basic combinatorial interest), gives lower-degree polynomial run-times
in some settings, transforms certain super-polynomial-time algorithms into
polynomial-time ones, and leads to Las Vegas algorithms for some coloring
problems for which only Monte Carlo algorithms were known.
We show that in certain conditions when the LLL condition is violated, a
variant of the MT algorithm can still produce a distribution which avoids most
of the bad events. We show in some cases this MT variant can run faster than
the original MT algorithm itself, and develop the first-known criterion for the
case of the asymmetric LLL. This can be used to find partial Latin transversals
-- improving upon earlier bounds of Stein (1975) -- among other applications.
We furthermore give applications in enumeration, showing that most applications
(where we aim for all or most of the bad events to be avoided) have many more
solutions than known before by proving that the MT-distribution has "large"
min-entropy and hence that its support-size is large
Distributed Connectivity Decomposition
We present time-efficient distributed algorithms for decomposing graphs with
large edge or vertex connectivity into multiple spanning or dominating trees,
respectively. As their primary applications, these decompositions allow us to
achieve information flow with size close to the connectivity by parallelizing
it along the trees. More specifically, our distributed decomposition algorithms
are as follows:
(I) A decomposition of each undirected graph with vertex-connectivity
into (fractionally) vertex-disjoint weighted dominating trees with total weight
, in rounds.
(II) A decomposition of each undirected graph with edge-connectivity
into (fractionally) edge-disjoint weighted spanning trees with total
weight , in
rounds.
We also show round complexity lower bounds of
and
for the above two decompositions,
using techniques of [Das Sarma et al., STOC'11]. Moreover, our
vertex-connectivity decomposition extends to centralized algorithms and
improves the time complexity of [Censor-Hillel et al., SODA'14] from
to near-optimal .
As corollaries, we also get distributed oblivious routing broadcast with
-competitive edge-congestion and -competitive
vertex-congestion. Furthermore, the vertex connectivity decomposition leads to
near-time-optimal -approximation of vertex connectivity: centralized
and distributed . The former moves
toward the 1974 conjecture of Aho, Hopcroft, and Ullman postulating an
centralized exact algorithm while the latter is the first distributed vertex
connectivity approximation
A Stronger LP Bound for Formula Size Lower Bounds via Clique Constraints
We introduce a new technique proving formula size lower bounds based on the linear programming bound originally introduced by Karchmer, Kushilevitz and Nisan (1995) and the theory of stable set polytope. We apply it to majority functions and prove their formula size lower bounds improved from the classical result of Khrapchenko (1971). Moreover, we introduce a notion of unbalanced recursive ternary majority functions motivated by a decomposition theory of monotone self-dual functions and give integrally matching upper and lower bounds of their formula size. We also show monotone formula size lower bounds of balanced recursive ternary majority functions improved from the quantum adversary bound of Laplante, Lee and Szegedy (2006)
Tight Distributed Listing of Cliques
Much progress has recently been made in understanding the complexity
landscape of subgraph finding problems in the CONGEST model of distributed
computing. However, so far, very few tight bounds are known in this area. For
triangle (i.e., 3-clique) listing, an optimal -round
distributed algorithm has been constructed by Chang et al.~[SODA 2019, PODC
2019]. Recent works of Eden et al.~[DISC 2019] and of Censor-Hillel et
al.~[PODC 2020] have shown sublinear algorithms for -listing, for each , but still leaving a significant gap between the upper bounds and the
known lower bounds of the problem.
In this paper, we completely close this gap. We show that for each , there is an -round distributed algorithm that lists
all -cliques in the communication network. Our algorithm is
\emph{optimal} up to a polylogarithmic factor, due to the -round lower bound of Fischer et al.~[SPAA 2018], which holds even in
the CONGESTED CLIQUE model. Together with the triangle-listing algorithm by
Chang et al.~[SODA 2019, PODC 2019], our result thus shows that the round
complexity of -listing, for all , is the same in both the CONGEST and
CONGESTED CLIQUE models, at rounds.
For , our result additionally matches the
lower bound for -\emph{detection} by Czumaj and Konrad [DISC 2018],
implying that the round complexities for detection and listing of are
equivalent in the CONGEST model.Comment: 21 pages. To appear in SODA 202
Ordered Biclique Partitions and Communication Complexity Problems
An ordered biclique partition of the complete graph on vertices is
a collection of bicliques (i.e., complete bipartite graphs) such that (i) every
edge of is covered by at least one and at most two bicliques in the
collection, and (ii) if an edge is covered by two bicliques then each
endpoint of is in the first class in one of these bicliques and in the
second class in other one. In this note, we give an explicit construction of
such a collection of size , which improves the bound
shown in the previous work [Disc. Appl. Math., 2014].
As the immediate consequences of this result, we show (i) a construction of
0/1 matrices of rank which have a fooling set of
size , i.e., the gap between rank and fooling set size can be at least
almost quadratic, and (ii) an improved lower bound on the
nondeterministic communication complexity of the clique vs. independent set
problem, which matches the best known lower bound on the deterministic version
of the problem shown by Kushilevitz, Linial and Ostrovsky [Combinatorica,
1999].Comment: 8 pages; the version submitted to a journa
Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts
In the domains of dataset construction and crowdsourcing, a notable challenge
is to aggregate labels from a heterogeneous set of labelers, each of whom is
potentially an expert in some subset of tasks (and less reliable in others). To
reduce costs of hiring human labelers or training automated labeling systems,
it is of interest to minimize the number of labelers while ensuring the
reliability of the resulting dataset. We model this as the problem of
performing -class classification using the predictions of smaller
classifiers, each trained on a subset of , and derive bounds on the number
of classifiers needed to accurately infer the true class of an unlabeled sample
under both adversarial and stochastic assumptions. By exploiting a connection
to the classical set cover problem, we produce a near-optimal scheme for
designing such configurations of classifiers which recovers the well known
one-vs.-one classification approach as a special case. Experiments with the
MNIST and CIFAR-10 datasets demonstrate the favorable accuracy (compared to a
centralized classifier) of our aggregation scheme applied to classifiers
trained on subsets of the data. These results suggest a new way to
automatically label data or adapt an existing set of local classifiers to
larger-scale multiclass problems.Comment: 27 pages, 8 figures, to be published in IEEE Journal on Selected
Areas in Information Theory (JSAIT) - Special Issue on Estimation and
Inferenc
Training Complex Models with Multi-Task Weak Supervision
As machine learning models continue to increase in complexity, collecting
large hand-labeled training sets has become one of the biggest roadblocks in
practice. Instead, weaker forms of supervision that provide noisier but cheaper
labels are often used. However, these weak supervision sources have diverse and
unknown accuracies, may output correlated labels, and may label different tasks
or apply at different levels of granularity. We propose a framework for
integrating and modeling such weak supervision sources by viewing them as
labeling different related sub-tasks of a problem, which we refer to as the
multi-task weak supervision setting. We show that by solving a matrix
completion-style problem, we can recover the accuracies of these multi-task
sources given their dependency structure, but without any labeled data, leading
to higher-quality supervision for training an end model. Theoretically, we show
that the generalization error of models trained with this approach improves
with the number of unlabeled data points, and characterize the scaling with
respect to the task and dependency structures. On three fine-grained
classification problems, we show that our approach leads to average gains of
20.2 points in accuracy over a traditional supervised approach, 6.8 points over
a majority vote baseline, and 4.1 points over a previously proposed weak
supervision method that models tasks separately
- …