58,649 research outputs found

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    Parallel Batch-Dynamic Graph Connectivity

    Full text link
    In this paper, we study batch parallel algorithms for the dynamic connectivity problem, a fundamental problem that has received considerable attention in the sequential setting. The most well known sequential algorithm for dynamic connectivity is the elegant level-set algorithm of Holm, de Lichtenberg and Thorup (HDT), which achieves O(log2n)O(\log^2 n) amortized time per edge insertion or deletion, and O(logn/loglogn)O(\log n / \log\log n) time per query. We design a parallel batch-dynamic connectivity algorithm that is work-efficient with respect to the HDT algorithm for small batch sizes, and is asymptotically faster when the average batch size is sufficiently large. Given a sequence of batched updates, where Δ\Delta is the average batch size of all deletions, our algorithm achieves O(lognlog(1+n/Δ))O(\log n \log(1 + n / \Delta)) expected amortized work per edge insertion and deletion and O(log3n)O(\log^3 n) depth w.h.p. Our algorithm answers a batch of kk connectivity queries in O(klog(1+n/k))O(k \log(1 + n/k)) expected work and O(logn)O(\log n) depth w.h.p. To the best of our knowledge, our algorithm is the first parallel batch-dynamic algorithm for connectivity.Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    A geometrical analysis of global stability in trained feedback networks

    Get PDF
    Recurrent neural networks have been extensively studied in the context of neuroscience and machine learning due to their ability to implement complex computations. While substantial progress in designing effective learning algorithms has been achieved in the last years, a full understanding of trained recurrent networks is still lacking. Specifically, the mechanisms that allow computations to emerge from the underlying recurrent dynamics are largely unknown. Here we focus on a simple, yet underexplored computational setup: a feedback architecture trained to associate a stationary output to a stationary input. As a starting point, we derive an approximate analytical description of global dynamics in trained networks which assumes uncorrelated connectivity weights in the feedback and in the random bulk. The resulting mean-field theory suggests that the task admits several classes of solutions, which imply different stability properties. Different classes are characterized in terms of the geometrical arrangement of the readout with respect to the input vectors, defined in the high-dimensional space spanned by the network population. We find that such approximate theoretical approach can be used to understand how standard training techniques implement the input-output task in finite-size feedback networks. In particular, our simplified description captures the local and the global stability properties of the target solution, and thus predicts training performance

    Implicit Decomposition for Write-Efficient Connectivity Algorithms

    Full text link
    The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an o(n)o(n)-sized "implicit decomposition" of a bounded-degree graph GG on nn nodes, which combined with read-only access to GG enables fast answers to connectivity and biconnectivity queries on GG. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on mm edges, we also provide the first o(m)o(m) writes and O(m)O(m) operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry

    Distributed Regression in Sensor Networks: Training Distributively with Alternating Projections

    Full text link
    Wireless sensor networks (WSNs) have attracted considerable attention in recent years and motivate a host of new challenges for distributed signal processing. The problem of distributed or decentralized estimation has often been considered in the context of parametric models. However, the success of parametric methods is limited by the appropriateness of the strong statistical assumptions made by the models. In this paper, a more flexible nonparametric model for distributed regression is considered that is applicable in a variety of WSN applications including field estimation. Here, starting with the standard regularized kernel least-squares estimator, a message-passing algorithm for distributed estimation in WSNs is derived. The algorithm can be viewed as an instantiation of the successive orthogonal projection (SOP) algorithm. Various practical aspects of the algorithm are discussed and several numerical simulations validate the potential of the approach.Comment: To appear in the Proceedings of the SPIE Conference on Advanced Signal Processing Algorithms, Architectures and Implementations XV, San Diego, CA, July 31 - August 4, 200
    corecore