767 research outputs found

    Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

    Full text link
    We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

    Clique versus Independent Set

    Get PDF
    Yannakakis' Clique versus Independent Set problem (CL-IS) in communication complexity asks for the minimum number of cuts separating cliques from stable sets in a graph, called CS-separator. Yannakakis provides a quasi-polynomial CS-separator, i.e. of size O(nlogn)O(n^{\log n}), and addresses the problem of finding a polynomial CS-separator. This question is still open even for perfect graphs. We show that a polynomial CS-separator almost surely exists for random graphs. Besides, if H is a split graph (i.e. has a vertex-partition into a clique and a stable set) then there exists a constant cHc_H for which we find a O(ncH)O(n^{c_H}) CS-separator on the class of H-free graphs. This generalizes a result of Yannakakis on comparability graphs. We also provide a O(nck)O(n^{c_k}) CS-separator on the class of graphs without induced path of length k and its complement. Observe that on one side, cHc_H is of order O(HlogH)O(|H| \log |H|) resulting from Vapnik-Chervonenkis dimension, and on the other side, ckc_k is exponential. One of the main reason why Yannakakis' CL-IS problem is fascinating is that it admits equivalent formulations. Our main result in this respect is to show that a polynomial CS-separator is equivalent to the polynomial Alon-Saks-Seymour Conjecture, asserting that if a graph has an edge-partition into k complete bipartite graphs, then its chromatic number is polynomially bounded in terms of k. We also show that the classical approach to the stubborn problem (arising in CSP) which consists in covering the set of all solutions by O(nlogn)O(n^{\log n}) instances of 2-SAT is again equivalent to the existence of a polynomial CS-separator

    (Δ+1)(\Delta+1) Coloring in the Congested Clique Model

    Full text link
    In this paper, we present improved algorithms for the (Δ+1)(\Delta+1) (vertex) coloring problem in the Congested-Clique model of distributed computing. In this model, the input is a graph on nn nodes, initially each node knows only its incident edges, and per round each two nodes can exchange O(logn)O(\log n) bits of information. Our key result is a randomized (Δ+1)(\Delta+1) vertex coloring algorithm that works in O(loglogΔlogΔ)O(\log\log \Delta \cdot \log^* \Delta)-rounds. This is achieved by combining the recent breakthrough result of [Chang-Li-Pettie, STOC'18] in the \local\ model and a degree reduction technique. We also get the following results with high probability: (1) (Δ+1)(\Delta+1)-coloring for Δ=O((n/logn)1ϵ)\Delta=O((n/\log n)^{1-\epsilon}) for any ϵ(0,1)\epsilon \in (0,1), within O(log(1/ϵ)logΔ)O(\log(1/\epsilon)\log^* \Delta) rounds, and (2) (Δ+Δ1/2+o(1))(\Delta+\Delta^{1/2+o(1)})-coloring within O(logΔ)O(\log^* \Delta) rounds. Turning to deterministic algorithms, we show a (Δ+1)(\Delta+1)-coloring algorithm that works in O(logΔ)O(\log \Delta) rounds.Comment: Appeared in ICALP'18 (the update version adds a missing part in the deterministic coloring procedure

    Algorithmic and enumerative aspects of the Moser-Tardos distribution

    Full text link
    Moser & Tardos have developed a powerful algorithmic approach (henceforth "MT") to the Lovasz Local Lemma (LLL); the basic operation done in MT and its variants is a search for "bad" events in a current configuration. In the initial stage of MT, the variables are set independently. We examine the distributions on these variables which arise during intermediate stages of MT. We show that these configurations have a more or less "random" form, building further on the "MT-distribution" concept of Haeupler et al. in understanding the (intermediate and) output distribution of MT. This has a variety of algorithmic applications; the most important is that bad events can be found relatively quickly, improving upon MT across the complexity spectrum: it makes some polynomial-time algorithms sub-linear (e.g., for Latin transversals, which are of basic combinatorial interest), gives lower-degree polynomial run-times in some settings, transforms certain super-polynomial-time algorithms into polynomial-time ones, and leads to Las Vegas algorithms for some coloring problems for which only Monte Carlo algorithms were known. We show that in certain conditions when the LLL condition is violated, a variant of the MT algorithm can still produce a distribution which avoids most of the bad events. We show in some cases this MT variant can run faster than the original MT algorithm itself, and develop the first-known criterion for the case of the asymmetric LLL. This can be used to find partial Latin transversals -- improving upon earlier bounds of Stein (1975) -- among other applications. We furthermore give applications in enumeration, showing that most applications (where we aim for all or most of the bad events to be avoided) have many more solutions than known before by proving that the MT-distribution has "large" min-entropy and hence that its support-size is large

    Distributed Connectivity Decomposition

    Full text link
    We present time-efficient distributed algorithms for decomposing graphs with large edge or vertex connectivity into multiple spanning or dominating trees, respectively. As their primary applications, these decompositions allow us to achieve information flow with size close to the connectivity by parallelizing it along the trees. More specifically, our distributed decomposition algorithms are as follows: (I) A decomposition of each undirected graph with vertex-connectivity kk into (fractionally) vertex-disjoint weighted dominating trees with total weight Ω(klogn)\Omega(\frac{k}{\log n}), in O~(D+n)\widetilde{O}(D+\sqrt{n}) rounds. (II) A decomposition of each undirected graph with edge-connectivity λ\lambda into (fractionally) edge-disjoint weighted spanning trees with total weight λ12(1ε)\lceil\frac{\lambda-1}{2}\rceil(1-\varepsilon), in O~(D+nλ)\widetilde{O}(D+\sqrt{n\lambda}) rounds. We also show round complexity lower bounds of Ω~(D+nk)\tilde{\Omega}(D+\sqrt{\frac{n}{k}}) and Ω~(D+nλ)\tilde{\Omega}(D+\sqrt{\frac{n}{\lambda}}) for the above two decompositions, using techniques of [Das Sarma et al., STOC'11]. Moreover, our vertex-connectivity decomposition extends to centralized algorithms and improves the time complexity of [Censor-Hillel et al., SODA'14] from O(n3)O(n^3) to near-optimal O~(m)\tilde{O}(m). As corollaries, we also get distributed oblivious routing broadcast with O(1)O(1)-competitive edge-congestion and O(logn)O(\log n)-competitive vertex-congestion. Furthermore, the vertex connectivity decomposition leads to near-time-optimal O(logn)O(\log n)-approximation of vertex connectivity: centralized O~(m)\widetilde{O}(m) and distributed O~(D+n)\tilde{O}(D+\sqrt{n}). The former moves toward the 1974 conjecture of Aho, Hopcroft, and Ullman postulating an O(m)O(m) centralized exact algorithm while the latter is the first distributed vertex connectivity approximation

    A Stronger LP Bound for Formula Size Lower Bounds via Clique Constraints

    Get PDF
    We introduce a new technique proving formula size lower bounds based on the linear programming bound originally introduced by Karchmer, Kushilevitz and Nisan (1995) and the theory of stable set polytope. We apply it to majority functions and prove their formula size lower bounds improved from the classical result of Khrapchenko (1971). Moreover, we introduce a notion of unbalanced recursive ternary majority functions motivated by a decomposition theory of monotone self-dual functions and give integrally matching upper and lower bounds of their formula size. We also show monotone formula size lower bounds of balanced recursive ternary majority functions improved from the quantum adversary bound of Laplante, Lee and Szegedy (2006)

    Tight Distributed Listing of Cliques

    Full text link
    Much progress has recently been made in understanding the complexity landscape of subgraph finding problems in the CONGEST model of distributed computing. However, so far, very few tight bounds are known in this area. For triangle (i.e., 3-clique) listing, an optimal O~(n1/3)\tilde{O}(n^{1/3})-round distributed algorithm has been constructed by Chang et al.~[SODA 2019, PODC 2019]. Recent works of Eden et al.~[DISC 2019] and of Censor-Hillel et al.~[PODC 2020] have shown sublinear algorithms for KpK_p-listing, for each p4p \geq 4, but still leaving a significant gap between the upper bounds and the known lower bounds of the problem. In this paper, we completely close this gap. We show that for each p4p \geq 4, there is an O~(n12/p)\tilde{O}(n^{1 - 2/p})-round distributed algorithm that lists all pp-cliques KpK_p in the communication network. Our algorithm is \emph{optimal} up to a polylogarithmic factor, due to the Ω~(n12/p)\tilde{\Omega}(n^{1 - 2/p})-round lower bound of Fischer et al.~[SPAA 2018], which holds even in the CONGESTED CLIQUE model. Together with the triangle-listing algorithm by Chang et al.~[SODA 2019, PODC 2019], our result thus shows that the round complexity of KpK_p-listing, for all pp, is the same in both the CONGEST and CONGESTED CLIQUE models, at Θ~(n12/p)\tilde{\Theta}(n^{1 - 2/p}) rounds. For p=4p=4, our result additionally matches the Ω~(n1/2)\tilde{\Omega}(n^{1/2}) lower bound for K4K_4-\emph{detection} by Czumaj and Konrad [DISC 2018], implying that the round complexities for detection and listing of K4K_4 are equivalent in the CONGEST model.Comment: 21 pages. To appear in SODA 202

    Ordered Biclique Partitions and Communication Complexity Problems

    Full text link
    An ordered biclique partition of the complete graph KnK_n on nn vertices is a collection of bicliques (i.e., complete bipartite graphs) such that (i) every edge of KnK_n is covered by at least one and at most two bicliques in the collection, and (ii) if an edge ee is covered by two bicliques then each endpoint of ee is in the first class in one of these bicliques and in the second class in other one. In this note, we give an explicit construction of such a collection of size n1/2+o(1)n^{1/2+o(1)}, which improves the O(n2/3)O(n^{2/3}) bound shown in the previous work [Disc. Appl. Math., 2014]. As the immediate consequences of this result, we show (i) a construction of n×nn \times n 0/1 matrices of rank n1/2+o(1)n^{1/2+o(1)} which have a fooling set of size nn, i.e., the gap between rank and fooling set size can be at least almost quadratic, and (ii) an improved lower bound (2o(1))logN(2-o(1)) \log N on the nondeterministic communication complexity of the clique vs. independent set problem, which matches the best known lower bound on the deterministic version of the problem shown by Kushilevitz, Linial and Ostrovsky [Combinatorica, 1999].Comment: 8 pages; the version submitted to a journa

    Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts

    Full text link
    In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To reduce costs of hiring human labelers or training automated labeling systems, it is of interest to minimize the number of labelers while ensuring the reliability of the resulting dataset. We model this as the problem of performing KK-class classification using the predictions of smaller classifiers, each trained on a subset of [K][K], and derive bounds on the number of classifiers needed to accurately infer the true class of an unlabeled sample under both adversarial and stochastic assumptions. By exploiting a connection to the classical set cover problem, we produce a near-optimal scheme for designing such configurations of classifiers which recovers the well known one-vs.-one classification approach as a special case. Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy (compared to a centralized classifier) of our aggregation scheme applied to classifiers trained on subsets of the data. These results suggest a new way to automatically label data or adapt an existing set of local classifiers to larger-scale multiclass problems.Comment: 27 pages, 8 figures, to be published in IEEE Journal on Selected Areas in Information Theory (JSAIT) - Special Issue on Estimation and Inferenc

    Training Complex Models with Multi-Task Weak Supervision

    Full text link
    As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity. We propose a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting. We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically, we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately
    corecore