10,413 research outputs found
Parallel Processing of Large Graphs
More and more large data collections are gathered worldwide in various IT
systems. Many of them possess the networked nature and need to be processed and
analysed as graph structures. Due to their size they require very often usage
of parallel paradigm for efficient computation. Three parallel techniques have
been compared in the paper: MapReduce, its map-side join extension and Bulk
Synchronous Parallel (BSP). They are implemented for two different graph
problems: calculation of single source shortest paths (SSSP) and collective
classification of graph nodes by means of relational influence propagation
(RIP). The methods and algorithms are applied to several network datasets
differing in size and structural profile, originating from three domains:
telecommunication, multimedia and microblog. The results revealed that
iterative graph processing with the BSP implementation always and
significantly, even up to 10 times outperforms MapReduce, especially for
algorithms with many iterations and sparse communication. Also MapReduce
extension based on map-side join usually noticeably presents better efficiency,
although not as much as BSP. Nevertheless, MapReduce still remains the good
alternative for enormous networks, whose data structures do not fit in local
memories.Comment: Preprint submitted to Future Generation Computer System
On Compact Routing for the Internet
While there exist compact routing schemes designed for grids, trees, and
Internet-like topologies that offer routing tables of sizes that scale
logarithmically with the network size, we demonstrate in this paper that in
view of recent results in compact routing research, such logarithmic scaling on
Internet-like topologies is fundamentally impossible in the presence of
topology dynamics or topology-independent (flat) addressing. We use analytic
arguments to show that the number of routing control messages per topology
change cannot scale better than linearly on Internet-like topologies. We also
employ simulations to confirm that logarithmic routing table size scaling gets
broken by topology-independent addressing, a cornerstone of popular
locator-identifier split proposals aiming at improving routing scaling in the
presence of network topology dynamics or host mobility. These pessimistic
findings lead us to the conclusion that a fundamental re-examination of
assumptions behind routing models and abstractions is needed in order to find a
routing architecture that would be able to scale ``indefinitely.''Comment: This is a significantly revised, journal version of cs/050802
A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs
We present a space and time efficient practical parallel algorithm for
approximating the diameter of massive weighted undirected graphs on distributed
platforms supporting a MapReduce-like abstraction. The core of the algorithm is
a weighted graph decomposition strategy generating disjoint clusters of bounded
weighted radius. Theoretically, our algorithm uses linear space and yields a
polylogarithmic approximation guarantee; moreover, for important practical
classes of graphs, it runs in a number of rounds asymptotically smaller than
those required by the natural approximation provided by the state-of-the-art
-stepping SSSP algorithm, which is its only practical linear-space
competitor in the aforementioned computational scenario. We complement our
theoretical findings with an extensive experimental analysis on large benchmark
graphs, which demonstrates that our algorithm attains substantial improvements
on a number of key performance indicators with respect to the aforementioned
competitor, while featuring a similar approximation ratio (a small constant
less than 1.4, as opposed to the polylogarithmic theoretical bound)
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
Scalable Routing Easy as PIE: a Practical Isometric Embedding Protocol (Technical Report)
We present PIE, a scalable routing scheme that achieves 100% packet delivery
and low path stretch. It is easy to implement in a distributed fashion and
works well when costs are associated to links. Scalability is achieved by using
virtual coordinates in a space of concise dimensionality, which enables greedy
routing based only on local knowledge. PIE is a general routing scheme, meaning
that it works on any graph. We focus however on the Internet, where routing
scalability is an urgent concern. We show analytically and by using simulation
that the scheme scales extremely well on Internet-like graphs. In addition, its
geometric nature allows it to react efficiently to topological changes or
failures by finding new paths in the network at no cost, yielding better
delivery ratios than standard algorithms. The proposed routing scheme needs an
amount of memory polylogarithmic in the size of the network and requires only
local communication between the nodes. Although each node constructs its
coordinates and routes packets locally, the path stretch remains extremely low,
even lower than for centralized or less scalable state-of-the-art algorithms:
PIE always finds short paths and often enough finds the shortest paths.Comment: This work has been previously published in IEEE ICNP'11. The present
document contains an additional optional mechanism, presented in Section
III-D, to further improve performance by using route asymmetry. It also
contains new simulation result
- …