312 research outputs found

    Streaming Verification of Graph Computations via Graph Structure

    Get PDF
    We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication

    Graph Sample and Hold: A Framework for Big-Graph Analytics

    Full text link
    Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks etc), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes used to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for big-graph analytics, called Graph Sample and Hold (gSH). To begin, the proposed framework samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state. We then show how to produce unbiased estimators for various graph properties from the sample. Given that the graph analysis algorithms will run on a sample instead of the whole population, the runtime complexity of these algorithm is kept under control. Moreover, given that the estimators of graph properties are unbiased, the approximation error is kept under control. Finally, we show the performance of the proposed framework (gSH) on various types of graphs, such as social graphs, among others

    Scaling Up Network Analysis and Mining: Statistical Sampling, Estimation, and Pattern Discovery

    Get PDF
    Network analysis and graph mining play a prominent role in providing insights and studying phenomena across various domains, including social, behavioral, biological, transportation, communication, and financial domains. Across all these domains, networks arise as a natural and rich representation for data. Studying these real-world networks is crucial for solving numerous problems that lead to high-impact applications. For example, identifying the behavior and interests of users in online social networks (e.g., viral marketing), monitoring and detecting virus outbreaks in human contact networks, predicting protein functions in biological networks, and detecting anomalous behavior in computer networks. A key characteristic of these networks is that their complex structure is massive and continuously evolving over time, which makes it challenging and computationally intensive to analyze, query, and model these networks in their entirety. In this dissertation, we propose sampling as well as fast, efficient, and scalable methods for network analysis and mining in both static and streaming graphs

    Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution

    Full text link
    The degree distribution is one of the most fundamental graph properties of interest for real-world graphs. It has been widely observed in numerous domains that graphs typically have a tailed or scale-free degree distribution. While the average degree is usually quite small, the variance is quite high and there are vertices with degrees at all scales. We focus on the problem of approximating the degree distribution of a large streaming graph, with small storage. We design an algorithm headtail, whose main novelty is a new estimator of infrequent degrees using truncated geometric random variables. We give a mathematical analysis of headtail and show that it has excellent behavior in practice. We can process streams will millions of edges with storage less than 1% and get extremely accurate approximations for all scales in the degree distribution. We also introduce a new notion of Relative Hausdorff distance between tailed histograms. Existing notions of distances between distributions are not suitable, since they ignore infrequent degrees in the tail. The Relative Hausdorff distance measures deviations at all scales, and is a more suitable distance for comparing degree distributions. By tracking this new measure, we are able to give strong empirical evidence of the convergence of headtail
    • …
    corecore