10,413 research outputs found

    Parallel Processing of Large Graphs

    Full text link
    More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. Also MapReduce extension based on map-side join usually noticeably presents better efficiency, although not as much as BSP. Nevertheless, MapReduce still remains the good alternative for enormous networks, whose data structures do not fit in local memories.Comment: Preprint submitted to Future Generation Computer System

    On Compact Routing for the Internet

    Full text link
    While there exist compact routing schemes designed for grids, trees, and Internet-like topologies that offer routing tables of sizes that scale logarithmically with the network size, we demonstrate in this paper that in view of recent results in compact routing research, such logarithmic scaling on Internet-like topologies is fundamentally impossible in the presence of topology dynamics or topology-independent (flat) addressing. We use analytic arguments to show that the number of routing control messages per topology change cannot scale better than linearly on Internet-like topologies. We also employ simulations to confirm that logarithmic routing table size scaling gets broken by topology-independent addressing, a cornerstone of popular locator-identifier split proposals aiming at improving routing scaling in the presence of network topology dynamics or host mobility. These pessimistic findings lead us to the conclusion that a fundamental re-examination of assumptions behind routing models and abstractions is needed in order to find a routing architecture that would be able to scale ``indefinitely.''Comment: This is a significantly revised, journal version of cs/050802

    A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs

    Full text link
    We present a space and time efficient practical parallel algorithm for approximating the diameter of massive weighted undirected graphs on distributed platforms supporting a MapReduce-like abstraction. The core of the algorithm is a weighted graph decomposition strategy generating disjoint clusters of bounded weighted radius. Theoretically, our algorithm uses linear space and yields a polylogarithmic approximation guarantee; moreover, for important practical classes of graphs, it runs in a number of rounds asymptotically smaller than those required by the natural approximation provided by the state-of-the-art Δ\Delta-stepping SSSP algorithm, which is its only practical linear-space competitor in the aforementioned computational scenario. We complement our theoretical findings with an extensive experimental analysis on large benchmark graphs, which demonstrates that our algorithm attains substantial improvements on a number of key performance indicators with respect to the aforementioned competitor, while featuring a similar approximation ratio (a small constant less than 1.4, as opposed to the polylogarithmic theoretical bound)

    Scalable Facility Location for Massive Graphs on Pregel-like Systems

    Full text link
    We propose a new scalable algorithm for facility location. Facility location is a classic problem, where the goal is to select a subset of facilities to open, from a set of candidate facilities F , in order to serve a set of clients C. The objective is to minimize the total cost of opening facilities plus the cost of serving each client from the facility it is assigned to. In this work, we are interested in the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on the graph. This setting allows to model natural problems arising in the Web and in social media applications. It also allows to leverage the inherent sparsity of such graphs, as the input is much smaller than the full pairwise distances between all vertices. To obtain truly scalable performance, we design a parallel algorithm that operates on clusters of shared-nothing machines. In particular, we target modern Pregel-like architectures, and we implement our algorithm on Apache Giraph. Our solution makes use of a recent result to build sketches for massive graphs, and of a fast parallel algorithm to find maximal independent sets, as building blocks. In so doing, we show how these problems can be solved on a Pregel-like architecture, and we investigate the properties of these algorithms. Extensive experimental results show that our algorithm scales gracefully to graphs with billions of edges, while obtaining values of the objective function that are competitive with a state-of-the-art sequential algorithm

    Scalable Routing Easy as PIE: a Practical Isometric Embedding Protocol (Technical Report)

    Get PDF
    We present PIE, a scalable routing scheme that achieves 100% packet delivery and low path stretch. It is easy to implement in a distributed fashion and works well when costs are associated to links. Scalability is achieved by using virtual coordinates in a space of concise dimensionality, which enables greedy routing based only on local knowledge. PIE is a general routing scheme, meaning that it works on any graph. We focus however on the Internet, where routing scalability is an urgent concern. We show analytically and by using simulation that the scheme scales extremely well on Internet-like graphs. In addition, its geometric nature allows it to react efficiently to topological changes or failures by finding new paths in the network at no cost, yielding better delivery ratios than standard algorithms. The proposed routing scheme needs an amount of memory polylogarithmic in the size of the network and requires only local communication between the nodes. Although each node constructs its coordinates and routes packets locally, the path stretch remains extremely low, even lower than for centralized or less scalable state-of-the-art algorithms: PIE always finds short paths and often enough finds the shortest paths.Comment: This work has been previously published in IEEE ICNP'11. The present document contains an additional optional mechanism, presented in Section III-D, to further improve performance by using route asymmetry. It also contains new simulation result
    • …
    corecore