2,571 research outputs found

    Every property is testable on a natural class of scale-free multigraphs

    Get PDF
    In this paper, we introduce a natural class of multigraphs called hierarchical-scale-free (HSF) multigraphs, and consider constant-time testability on the class. We show that a very wide subclass, specifically, that in which the power-law exponent is greater than two, of HSF is hyperfinite. Based on this result, an algorithm for a deterministic partitioning oracle can be constructed. We conclude by showing that every property is constant-time testable on the above subclass of HSF. This algorithm utilizes findings by Newman and Sohler of STOC'11. However, their algorithm is based on the bounded-degree model, while it is known that actual scale-free networks usually include hubs, which have a very large degree. HSF is based on scale-free properties and includes such hubs. This is the first universal result of constant-time testability on the general graph model, and it has the potential to be applicable on a very wide range of scale-free networks.Comment: 13 pages, one figure. Difference from ver. 1: Definitions of HSF and SF become more general. Typos were fixe

    Pruning based Distance Sketches with Provable Guarantees on Random Graphs

    Full text link
    Measuring the distances between vertices on graphs is one of the most fundamental components in network analysis. Since finding shortest paths requires traversing the graph, it is challenging to obtain distance information on large graphs very quickly. In this work, we present a preprocessing algorithm that is able to create landmark based distance sketches efficiently, with strong theoretical guarantees. When evaluated on a diverse set of social and information networks, our algorithm significantly improves over existing approaches by reducing the number of landmarks stored, preprocessing time, or stretch of the estimated distances. On Erd\"{o}s-R\'{e}nyi graphs and random power law graphs with degree distribution exponent 2<β<32 < \beta < 3, our algorithm outputs an exact distance data structure with space between Θ(n5/4)\Theta(n^{5/4}) and Θ(n3/2)\Theta(n^{3/2}) depending on the value of β\beta, where nn is the number of vertices. We complement the algorithm with tight lower bounds for Erdos-Renyi graphs and the case when β\beta is close to two.Comment: Full version for the conference paper to appear in The Web Conference'1

    Privacy-preserving queries on encrypted databases

    Get PDF
    In today's Internet, with the advent of cloud computing, there is a natural desire for enterprises, organizations, and end users to outsource increasingly large amounts of data to a cloud provider. Therefore, ensuring security and privacy is becoming a significant challenge for cloud computing, especially for users with sensitive and valuable data. Recently, many efficient and scalable query processing methods over encrypted data have been proposed. Despite that, numerous challenges remain to be addressed due to the high complexity of many important queries on encrypted large-scale datasets. This thesis studies the problem of privacy-preserving database query processing on structured data (e.g., relational and graph databases). In particular, this thesis proposes several practical and provable secure structured encryption schemes that allow the data owner to encrypt data without losing the ability to query and retrieve it efficiently for authorized clients. This thesis includes two parts. The first part investigates graph encryption schemes. This thesis proposes a graph encryption scheme for approximate shortest distance queries. Such scheme allows the client to query the shortest distance between two nodes in an encrypted graph securely and efficiently. Moreover, this thesis also explores how the techniques can be applied to other graph queries. The second part of this thesis proposes secure top-k query processing schemes on encrypted relational databases. Furthermore, the thesis develops a scheme for the top-k join queries over multiple encrypted relations. Finally, this thesis demonstrates the practicality of the proposed encryption schemes by prototyping the encryption systems to perform queries on real-world encrypted datasets

    Scalable Facility Location for Massive Graphs on Pregel-like Systems

    Full text link
    We propose a new scalable algorithm for facility location. Facility location is a classic problem, where the goal is to select a subset of facilities to open, from a set of candidate facilities F , in order to serve a set of clients C. The objective is to minimize the total cost of opening facilities plus the cost of serving each client from the facility it is assigned to. In this work, we are interested in the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on the graph. This setting allows to model natural problems arising in the Web and in social media applications. It also allows to leverage the inherent sparsity of such graphs, as the input is much smaller than the full pairwise distances between all vertices. To obtain truly scalable performance, we design a parallel algorithm that operates on clusters of shared-nothing machines. In particular, we target modern Pregel-like architectures, and we implement our algorithm on Apache Giraph. Our solution makes use of a recent result to build sketches for massive graphs, and of a fast parallel algorithm to find maximal independent sets, as building blocks. In so doing, we show how these problems can be solved on a Pregel-like architecture, and we investigate the properties of these algorithms. Extensive experimental results show that our algorithm scales gracefully to graphs with billions of edges, while obtaining values of the objective function that are competitive with a state-of-the-art sequential algorithm

    When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

    Full text link
    Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold τ\tau. In contrast to previous work where τ\tau is assumed to be quite close to 1, we focus on recommendation applications where τ\tau is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small τ\tau. To the best of our knowledge, there is no practical solution for computing all user pairs with, say τ=0.2\tau = 0.2 on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges
    corecore