37 research outputs found

    Sublinear Random Access Generators for Preferential Attachment Graphs

    Get PDF

    Local Access to Huge Random Objects Through Partial Sampling

    Get PDF
    © Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee. Consider an algorithm performing a computation on a huge random object (for example a random graph or a “long” random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally “on-the-fly” (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-RĂ©nyi G(n, p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree ∆. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n, p) model). The distribution over valid colorings is instead specified via a “huge” input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≄ 9∆, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself

    Analyzing and Modeling Real-World Phenomena with Complex Networks: A Survey of Applications

    Get PDF
    The success of new scientific areas can be assessed by their potential for contributing to new theoretical approaches and in applications to real-world problems. Complex networks have fared extremely well in both of these aspects, with their sound theoretical basis developed over the years and with a variety of applications. In this survey, we analyze the applications of complex networks to real-world problems and data, with emphasis in representation, analysis and modeling, after an introduction to the main concepts and models. A diversity of phenomena are surveyed, which may be classified into no less than 22 areas, providing a clear indication of the impact of the field of complex networks.Comment: 103 pages, 3 figures and 7 tables. A working manuscript, suggestions are welcome

    Scalable Parallel Algorithms for Massive Scale-free Graphs

    Get PDF
    Efficiently storing and processing massive graph data sets is a challenging problem as researchers seek to leverage “Big Data” to answer next-generation scientific questions. New techniques are required to process large scale-free graphs in shared, distributed, and external memory. This dissertation develops new techniques to parallelize the storage, computation, and communication for scale-free graphs with high-degree vertices. Our work facilitates the processing of large real-world graph datasets through the development of parallel algorithms and tools that scale to large computational and memory resources, overcoming challenges not addressed by existing techniques. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers, clusters with local non-volatile memory, and shared memory systems. We present three novel techniques to address scaling challenges in processing large scale-free graphs. We apply an asynchronous graph traversal technique using prioritized visitor queues that is capable of tolerating data latencies to the external graph storage media and message passing communication. To accommodate large high-degree vertices, we present an edge list partitioning technique that evenly partitions graphs containing high-degree vertices. Finally, we propose a technique we call distributed delegates that distributes and parallelizes the storage, computation, and communication when processing high-degree vertices. The edges of high-degree vertices are distributed, providing additional opportunities for parallelism not present in existing methods. We apply our techniques to multiple graph algorithms: Breadth-First Search, Single Source Shortest Path, Connected Components, K-Core decomposition, Triangle Counting, and Page Rank. Our experimental study of these algorithms demonstrates excellent scalability on supercomputers, clusters with non-volatile memory, and shared memory systems. Our study includes multiple synthetic scale-free graph models, the largest of which has trillion edges, and real-world input graphs. On a supercomputer, we demonstrate scalability up to 131K processors, and improve the best known Graph500 results for IBM BG/P Intrepid by 15%

    Diameter and Rumour Spreading in Real-World Network Models

    Get PDF
    The so-called 'small-world phenomenon', observed in many real-world networks, is that there is a short path between any two nodes of a network, whose length is much smaller that the network's size, typically growing as a logarithmic function. Several mathematical models have been defined for social networks, the WWW, etc., and this phenomenon translates to proving that such models have a small diameter. In the first part of this thesis, we rigorously analyze the diameters of several random graph classes that are introduced specifically to model complex networks, verifying whether this phenomenon occurs in them. In Chapter 3 we develop a versatile technique for proving upper bounds for diameters of evolving random graph models, which is based on defining a coupling between these models and variants of random recursive trees. Using this technique we prove, for the first time, logarithmic upper bounds for the diameters of seven well known models. This technique gives unified simple proofs for known results, provides lots of new ones, and will help in proving many of the forthcoming network models are small-world. Perhaps, for any given model, one can come up with an ad hoc argument that the diameter is O(log n), but it is interesting that a unified technique works for such a wide variety of models, and our first major contribution is introducing such a technique. In Chapter 4 we estimate the diameter of random Apollonian networks, a class of random planar graphs. We also give lower and upper bounds for the length of their longest paths. In Chapter 5 we study the diameter of another random graph model, called the random surfer Web-graph model. We find logarithmic upper bounds for the diameter, which are almost tight in the special case when the growing graph is a tree. Although the two models are quite different, surprisingly the same engine is used for proving these results, namely the powerful technique of Broutin and Devroye (Large deviations for the weighted height of an extended class of trees, Algorithmica 2006) for analyzing weighted heights of random trees, which we have adapted and applied to the two random graph models. Our second major contribution is demonstrating the flexibility of this technique via providing two significant applications. In the second part of the thesis, we study rumour spreading in networks. Suppose that initially a node has a piece of information and wants to spread it to all nodes in a network quickly. The problem of designing an efficient protocol performing this task is a fundamental one in distributed computing and has applications in maintenance of replicated databases, broadcasting algorithms, analyzing news propagation is social networks and the spread of viruses on the Internet. Given a rumour spreading protocol, its spread time is the time it takes for the rumour to spread in the whole graph. In Chapter 6 we prove several tight lower and upper bounds for the spread times of two well known randomized rumour spreading protocols, namely the synchronous push&pull protocol and the asynchronous push&pull protocol. In particular, we show the average spread time in both protocols is always at most linear. In Chapter 7 we study the performance of the synchronous push&pull protocol on random k-trees. We show that a.a.s. after a polylogarithmic amount of time, 99 percent of the nodes are informed, but to inform all vertices, a polynomial amount of time is required. Our third majoc contribution is giving analytical proofs for two experimentally verified statements: firstly, the asynchronous push&pull protocol is typically faster than its synchronous variant, and secondly, it takes considerably more time to inform the last 1 percent of the vertices in a social network than the first 99 percent. We hope that our work on the asynchronous push&pull protocol attracts attention to this fascinating model
    corecore