Search CORE

767 research outputs found

Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

Author: Ali Patwary
Assefaw H. Gebremedhin
David F. Gleich
Md. Mostofa
Ryan A. Rossi
Publication venue
Publication date: 25/12/2013
Field of study

We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Clique versus Independent Set

Author: Bousquet Nicolas
Lagoutte Aurélie
Thomassé Stéphan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Yannakakis' Clique versus Independent Set problem (CL-IS) in communication complexity asks for the minimum number of cuts separating cliques from stable sets in a graph, called CS-separator. Yannakakis provides a quasi-polynomial CS-separator, i.e. of size

O(n^{\log n})

, and addresses the problem of finding a polynomial CS-separator. This question is still open even for perfect graphs. We show that a polynomial CS-separator almost surely exists for random graphs. Besides, if H is a split graph (i.e. has a vertex-partition into a clique and a stable set) then there exists a constant

c_H

for which we find a

O(n^{c_H})

CS-separator on the class of H-free graphs. This generalizes a result of Yannakakis on comparability graphs. We also provide a

O(n^{c_k})

CS-separator on the class of graphs without induced path of length k and its complement. Observe that on one side,

c_H

is of order

O(|H| \log |H|)

resulting from Vapnik-Chervonenkis dimension, and on the other side,

c_k

is exponential. One of the main reason why Yannakakis' CL-IS problem is fascinating is that it admits equivalent formulations. Our main result in this respect is to show that a polynomial CS-separator is equivalent to the polynomial Alon-Saks-Seymour Conjecture, asserting that if a graph has an edge-partition into k complete bipartite graphs, then its chromatic number is polynomially bounded in terms of k. We also show that the classical approach to the stubborn problem (arising in CSP) which consists in covering the set of all solutions by

O(n^{\log n})

instances of 2-SAT is again equivalent to the existence of a polynomial CS-separator

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

$(\Delta+1)$ Coloring in the Congested Clique Model

Author: Parter Merav
Publication venue
Publication date: 12/01/2020
Field of study

In this paper, we present improved algorithms for the

(\Delta+1)

(vertex) coloring problem in the Congested-Clique model of distributed computing. In this model, the input is a graph on

n

nodes, initially each node knows only its incident edges, and per round each two nodes can exchange

O(\log n)

bits of information. Our key result is a randomized

(\Delta+1)

vertex coloring algorithm that works in

O(\log\log \Delta \cdot \log^* \Delta)

-rounds. This is achieved by combining the recent breakthrough result of [Chang-Li-Pettie, STOC'18] in the \local\ model and a degree reduction technique. We also get the following results with high probability: (1)

(\Delta+1)

-coloring for

\Delta=O((n/\log n)^{1-\epsilon})

for any

\epsilon \in (0,1)

, within

O(\log(1/\epsilon)\log^* \Delta)

rounds, and (2)

(\Delta+\Delta^{1/2+o(1)})

-coloring within

O(\log^* \Delta)

rounds. Turning to deterministic algorithms, we show a

(\Delta+1)

-coloring algorithm that works in

O(\log \Delta)

rounds.Comment: Appeared in ICALP'18 (the update version adds a missing part in the deterministic coloring procedure

arXiv.org e-Print Archive

Algorithmic and enumerative aspects of the Moser-Tardos distribution

Author: Harris David G.
Srinivasan Aravind
Publication venue
Publication date: 16/02/2017
Field of study

Moser & Tardos have developed a powerful algorithmic approach (henceforth "MT") to the Lovasz Local Lemma (LLL); the basic operation done in MT and its variants is a search for "bad" events in a current configuration. In the initial stage of MT, the variables are set independently. We examine the distributions on these variables which arise during intermediate stages of MT. We show that these configurations have a more or less "random" form, building further on the "MT-distribution" concept of Haeupler et al. in understanding the (intermediate and) output distribution of MT. This has a variety of algorithmic applications; the most important is that bad events can be found relatively quickly, improving upon MT across the complexity spectrum: it makes some polynomial-time algorithms sub-linear (e.g., for Latin transversals, which are of basic combinatorial interest), gives lower-degree polynomial run-times in some settings, transforms certain super-polynomial-time algorithms into polynomial-time ones, and leads to Las Vegas algorithms for some coloring problems for which only Monte Carlo algorithms were known. We show that in certain conditions when the LLL condition is violated, a variant of the MT algorithm can still produce a distribution which avoids most of the bad events. We show in some cases this MT variant can run faster than the original MT algorithm itself, and develop the first-known criterion for the case of the asymmetric LLL. This can be used to find partial Latin transversals -- improving upon earlier bounds of Stein (1975) -- among other applications. We furthermore give applications in enumeration, showing that most applications (where we aim for all or most of the bad events to be avoided) have many more solutions than known before by proving that the MT-distribution has "large" min-entropy and hence that its support-size is large

arXiv.org e-Print Archive

Distributed Connectivity Decomposition

Author: Censor-Hillel Keren
Ghaffari Mohsen
Kuhn Fabian
Publication venue
Publication date: 21/11/2013
Field of study

We present time-efficient distributed algorithms for decomposing graphs with large edge or vertex connectivity into multiple spanning or dominating trees, respectively. As their primary applications, these decompositions allow us to achieve information flow with size close to the connectivity by parallelizing it along the trees. More specifically, our distributed decomposition algorithms are as follows: (I) A decomposition of each undirected graph with vertex-connectivity

k

into (fractionally) vertex-disjoint weighted dominating trees with total weight

\Omega(\frac{k}{\log n})

, in

\widetilde{O}(D+\sqrt{n})

rounds. (II) A decomposition of each undirected graph with edge-connectivity

\lambda

into (fractionally) edge-disjoint weighted spanning trees with total weight

\lceil\frac{\lambda-1}{2}\rceil(1-\varepsilon)

, in

\widetilde{O}(D+\sqrt{n\lambda})

rounds. We also show round complexity lower bounds of

\tilde{\Omega}(D+\sqrt{\frac{n}{k}})

and

\tilde{\Omega}(D+\sqrt{\frac{n}{\lambda}})

for the above two decompositions, using techniques of [Das Sarma et al., STOC'11]. Moreover, our vertex-connectivity decomposition extends to centralized algorithms and improves the time complexity of [Censor-Hillel et al., SODA'14] from

O(n^3)

to near-optimal

\tilde{O}(m)

. As corollaries, we also get distributed oblivious routing broadcast with

O(1)

-competitive edge-congestion and

O(\log n)

-competitive vertex-congestion. Furthermore, the vertex connectivity decomposition leads to near-time-optimal

O(\log n)

-approximation of vertex connectivity: centralized

\widetilde{O}(m)

and distributed

\tilde{O}(D+\sqrt{n})

. The former moves toward the 1974 conjecture of Aho, Hopcroft, and Ullman postulating an

O(m)

centralized exact algorithm while the latter is the first distributed vertex connectivity approximation

arXiv.org e-Print Archive

CiteSeerX

A Stronger LP Bound for Formula Size Lower Bounds via Clique Constraints

Author: Ueno Kenya
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th International Symposium on Theoretical Aspects of Computer Science
Publication date: 01/01/2009
Field of study

We introduce a new technique proving formula size lower bounds based on the linear programming bound originally introduced by Karchmer, Kushilevitz and Nisan (1995) and the theory of stable set polytope. We apply it to majority functions and prove their formula size lower bounds improved from the classical result of Khrapchenko (1971). Moreover, we introduce a notion of unbalanced recursive ternary majority functions motivated by a decomposition theory of monotone self-dual functions and give integrally matching upper and lower bounds of their formula size. We also show monotone formula size lower bounds of balanced recursive ternary majority functions improved from the quantum adversary bound of Laplante, Lee and Szegedy (2006)

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Tight Distributed Listing of Cliques

Author: Censor-Hillel Keren
Chang Yi-Jun
Gall François Le
Leitersdorf Dean
Publication venue
Publication date: 14/11/2020
Field of study

Much progress has recently been made in understanding the complexity landscape of subgraph finding problems in the CONGEST model of distributed computing. However, so far, very few tight bounds are known in this area. For triangle (i.e., 3-clique) listing, an optimal

\tilde{O}(n^{1/3})

-round distributed algorithm has been constructed by Chang et al.~[SODA 2019, PODC 2019]. Recent works of Eden et al.~[DISC 2019] and of Censor-Hillel et al.~[PODC 2020] have shown sublinear algorithms for

K_p

-listing, for each

p \geq 4

, but still leaving a significant gap between the upper bounds and the known lower bounds of the problem. In this paper, we completely close this gap. We show that for each

p \geq 4

, there is an

\tilde{O}(n^{1 - 2/p})

-round distributed algorithm that lists all

p

-cliques

K_p

in the communication network. Our algorithm is \emph{optimal} up to a polylogarithmic factor, due to the

\tilde{\Omega}(n^{1 - 2/p})

-round lower bound of Fischer et al.~[SPAA 2018], which holds even in the CONGESTED CLIQUE model. Together with the triangle-listing algorithm by Chang et al.~[SODA 2019, PODC 2019], our result thus shows that the round complexity of

K_p

-listing, for all

p

, is the same in both the CONGEST and CONGESTED CLIQUE models, at

\tilde{\Theta}(n^{1 - 2/p})

rounds. For

p=4

, our result additionally matches the

\tilde{\Omega}(n^{1/2})

lower bound for

K_4

-\emph{detection} by Czumaj and Konrad [DISC 2018], implying that the round complexities for detection and listing of

K_4

are equivalent in the CONGEST model.Comment: 21 pages. To appear in SODA 202

arXiv.org e-Print Archive

Ordered Biclique Partitions and Communication Complexity Problems

Author: Amano Kazuyuki
Shigeta Manami
Publication venue
Publication date: 26/12/2013
Field of study

An ordered biclique partition of the complete graph

K_n

n

vertices is a collection of bicliques (i.e., complete bipartite graphs) such that (i) every edge of

K_n

is covered by at least one and at most two bicliques in the collection, and (ii) if an edge

e

is covered by two bicliques then each endpoint of

e

is in the first class in one of these bicliques and in the second class in other one. In this note, we give an explicit construction of such a collection of size

n^{1/2+o(1)}

, which improves the

O(n^{2/3})

bound shown in the previous work [Disc. Appl. Math., 2014]. As the immediate consequences of this result, we show (i) a construction of

n \times n

0/1 matrices of rank

n^{1/2+o(1)}

which have a fooling set of size

n

, i.e., the gap between rank and fooling set size can be at least almost quadratic, and (ii) an improved lower bound

(2-o(1)) \log N

on the nondeterministic communication complexity of the clique vs. independent set problem, which matches the best known lower bound on the deterministic version of the problem shown by Kushilevitz, Linial and Ostrovsky [Combinatorica, 1999].Comment: 8 pages; the version submitted to a journa

arXiv.org e-Print Archive

Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts

Author: Ahn Surin
Ozgur Ayfer
Pilanci Mert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/01/2021
Field of study

In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To reduce costs of hiring human labelers or training automated labeling systems, it is of interest to minimize the number of labelers while ensuring the reliability of the resulting dataset. We model this as the problem of performing

K

-class classification using the predictions of smaller classifiers, each trained on a subset of

[K]

, and derive bounds on the number of classifiers needed to accurately infer the true class of an unlabeled sample under both adversarial and stochastic assumptions. By exploiting a connection to the classical set cover problem, we produce a near-optimal scheme for designing such configurations of classifiers which recovers the well known one-vs.-one classification approach as a special case. Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy (compared to a centralized classifier) of our aggregation scheme applied to classifiers trained on subsets of the data. These results suggest a new way to automatically label data or adapt an existing set of local classifiers to larger-scale multiclass problems.Comment: 27 pages, 8 figures, to be published in IEEE Journal on Selected Areas in Information Theory (JSAIT) - Special Issue on Estimation and Inferenc

arXiv.org e-Print Archive

Training Complex Models with Multi-Task Weak Supervision

Author: Dunnmon Jared
Hancock Braden
Pandey Shreyash
Ratner Alexander
Ré Christopher
Sala Frederic
Publication venue
Publication date: 07/12/2018
Field of study

As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity. We propose a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting. We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically, we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately

arXiv.org e-Print Archive