4,333 research outputs found

    A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem

    Full text link
    Many graph mining applications rely on detecting subgraphs which are near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand the densest subgraph problem (DSP) which maximizes the average degree over all subgraphs is solvable in polynomial time but for many networks fails to find subgraphs which are near-cliques. On the other hand, formulations that are geared towards finding near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and finds near-cliques when the DSP fails. Surprisingly, our formulation is a simple variation of the DSP. Specifically, we define the triangle densest subgraph problem (TDSP): given G(V,E)G(V,E), find a subset of vertices SS^* such that τ(S)=maxSVt(S)S\tau(S^*)=\max_{S \subseteq V} \frac{t(S)}{|S|}, where t(S)t(S) is the number of triangles induced by the set SS. We provide various exact and approximation algorithms which the solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the kk-clique average density. Finally, we provide empirical evidence that the TDSP should be used whenever the output of the DSP fails to output a near-clique.Comment: 42 page

    Streaming Verification of Graph Properties

    Get PDF
    Streaming interactive proofs (SIPs) are a framework for outsourced computation. A computationally limited streaming client (the verifier) hands over a large data set to an untrusted server (the prover) in the cloud and the two parties run a protocol to confirm the correctness of result with high probability. SIPs are particularly interesting for problems that are hard to solve (or even approximate) well in a streaming setting. The most notable of these problems is finding maximum matchings, which has received intense interest in recent years but has strong lower bounds even for constant factor approximations. In this paper, we present efficient streaming interactive proofs that can verify maximum matchings exactly. Our results cover all flavors of matchings (bipartite/non-bipartite and weighted). In addition, we also present streaming verifiers for approximate metric TSP. In particular, these are the first efficient results for weighted matchings and for metric TSP in any streaming verification model.Comment: 26 pages, 2 figure, 1 tabl

    Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals

    Full text link
    Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using only sequential access to the data. They can be used to answer point and Lp-norm queries for the signal. There is an intriguing connection between the so-called p-stable (or sum-stable) and the max-stable sketches. Rigorous performance guarantees through error-probability estimates are derived and the algorithmic implementation is discussed

    Robust Densest Subgraph Discovery

    Full text link
    Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph G=(V,E)G=(V,E) with an edge-weight vector w=(we)eEw=(w_e)_{e\in E}, we aim to find SVS\subseteq V that maximizes the density, i.e., w(S)/Sw(S)/|S|, where w(S)w(S) is the sum of the weights of the edges in the subgraph induced by SS. Although the densest subgraph problem is one of the most well-studied optimization problems for dense subgraph discovery, there is an implicit strong assumption; it is assumed that the weights of all the edges are known exactly as input. In real-world applications, there are often cases where we have only uncertain information of the edge weights. In this study, we provide a framework for dense subgraph discovery under the uncertainty of edge weights. Specifically, we address such an uncertainty issue using the theory of robust optimization. First, we formulate our fundamental problem, the robust densest subgraph problem, and present a simple algorithm. We then formulate the robust densest subgraph problem with sampling oracle that models dense subgraph discovery using an edge-weight sampling oracle, and present an algorithm with a strong theoretical performance guarantee. Computational experiments using both synthetic graphs and popular real-world graphs demonstrate the effectiveness of our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201

    Randomized Composable Core-sets for Distributed Submodular Maximization

    Full text link
    An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of {\em composable core-sets}, and has been recently applied to solve diversity maximization problems as well as several clustering problems. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique \cite{IMMM14}. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a {\em random clustering} of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. In summary, we show that a simple greedy algorithm results in a 1/31/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint. This is in contrast to a known O(logkk)O({\log k\over \sqrt{k}}) impossibility result for (non-randomized) composable core-set. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with O(n)O(n) total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy\mathsf{PseudoGreedy}, we present an improved 0.5450.545-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor 1/21/2 in a constant number of rounds

    Recursive Sketching For Frequency Moments

    Full text link
    In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute FkF_k (for k>2k>2) in space complexity O(\mbox{\em poly-log}(n,m)\cdot n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in nn and mm, where mm is the length of the stream and nn is the upper bound on the number of distinct elements in a stream. The best known lower bound for large moments is Ω(log(n)n12k)\Omega(\log(n)n^{1-\frac2k}). A follow-up work of Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic factors of Indyk and Woodruff to O(log2(m)(logn+logm)n12k)O(\log^2(m)\cdot (\log n+ \log m)\cdot n^{1-{2\over k}}). Further reduction of poly-log factors has been an elusive goal since 2006, when Indyk and Woodruff method seemed to hit a natural "barrier." Using our simple recursive sketch, we provide a different yet simple approach to obtain a O(log(m)log(nm)(loglogn)4n12k)O(\log(m)\log(nm)\cdot (\log\log n)^4\cdot n^{1-{2\over k}}) algorithm for constant ϵ\epsilon (our bound is, in fact, somewhat stronger, where the (loglogn)(\log\log n) term can be replaced by any constant number of log\log iterations instead of just two or three, thus approaching lognlog^*n. Our bound also works for non-constant ϵ\epsilon (for details see the body of the paper). Further, our algorithm requires only 44-wise independence, in contrast to existing methods that use pseudo-random generators for computing large frequency moments

    Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem

    Get PDF
    In this paper, we study linear programming based approaches to the maximum matching problem in the semi-streaming model. The semi-streaming model has gained attention as a model for processing massive graphs as the importance of such graphs has increased. This is a model where edges are streamed-in in an adversarial order and we are allowed a space proportional to the number of vertices in a graph. In recent years, there has been several new results in this semi-streaming model. However broad techniques such as linear programming have not been adapted to this model. We present several techniques to adapt and optimize linear programming based approaches in the semi-streaming model with an application to the maximum matching problem. As a consequence, we improve (almost) all previous results on this problem, and also prove new results on interesting variants
    corecore