1,222 research outputs found
A Divide-and-Conquer Algorithm for Betweenness Centrality
The problem of efficiently computing the betweenness centrality of nodes has
been researched extensively. To date, the best known exact and centralized
algorithm for this task is an algorithm proposed in 2001 by Brandes. The
contribution of our paper is Brandes++, an algorithm for exact efficient
computation of betweenness centrality. The crux of our algorithm is that we
create a sketch of the graph, that we call the skeleton, by replacing subgraphs
with simpler graph structures. Depending on the underlying graph structure,
using this skeleton and by keeping appropriate summaries Brandes++ we can
achieve significantly low running times in our computations. Extensive
experimental evaluation on real life datasets demonstrate the efficacy of our
algorithm for different types of graphs. We release our code for benefit of the
research community.Comment: Shorter version of this paper appeared in Siam Data Mining 201
Discriminative Distance-Based Network Indices with Application to Link Prediction
In large networks, using the length of shortest paths as the distance measure
has shortcomings. A well-studied shortcoming is that extending it to
disconnected graphs and directed graphs is controversial. The second
shortcoming is that a huge number of vertices may have exactly the same score.
The third shortcoming is that in many applications, the distance between two
vertices not only depends on the length of shortest paths, but also on the
number of shortest paths. In this paper, first we develop a new distance
measure between vertices of a graph that yields discriminative distance-based
centrality indices. This measure is proportional to the length of shortest
paths and inversely proportional to the number of shortest paths. We present
algorithms for exact computation of the proposed discriminative indices.
Second, we develop randomized algorithms that precisely estimate average
discriminative path length and average discriminative eccentricity and show
that they give -approximations of these indices. Third, we
perform extensive experiments over several real-world networks from different
domains. In our experiments, we first show that compared to the traditional
indices, discriminative indices have usually much more discriminability. Then,
we show that our randomized algorithms can very precisely estimate average
discriminative path length and average discriminative eccentricity, using only
few samples. Then, we show that real-world networks have usually a tiny average
discriminative path length, bounded by a constant (e.g., 2). Fourth, in order
to better motivate the usefulness of our proposed distance measure, we present
a novel link prediction method, that uses discriminative distance to decide
which vertices are more likely to form a link in future, and show its superior
performance compared to the well-known existing measures
The Parameterized Complexity of Centrality Improvement in Networks
The centrality of a vertex v in a network intuitively captures how important
v is for communication in the network. The task of improving the centrality of
a vertex has many applications, as a higher centrality often implies a larger
impact on the network or less transportation or administration cost. In this
work we study the parameterized complexity of the NP-complete problems
Closeness Improvement and Betweenness Improvement in which we ask to improve a
given vertex' closeness or betweenness centrality by a given amount through
adding a given number of edges to the network. Herein, the closeness of a
vertex v sums the multiplicative inverses of distances of other vertices to v
and the betweenness sums for each pair of vertices the fraction of shortest
paths going through v. Unfortunately, for the natural parameter "number of
edges to add" we obtain hardness results, even in rather restricted cases. On
the positive side, we also give an island of tractability for the parameter
measuring the vertex deletion distance to cluster graphs
Fast Shortest Path Distance Estimation in Large Networks
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications.
In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks.
We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random.
Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Considerations about multistep community detection
The problem and implications of community detection in networks have raised a
huge attention, for its important applications in both natural and social
sciences. A number of algorithms has been developed to solve this problem,
addressing either speed optimization or the quality of the partitions
calculated. In this paper we propose a multi-step procedure bridging the
fastest, but less accurate algorithms (coarse clustering), with the slowest,
most effective ones (refinement). By adopting heuristic ranking of the nodes,
and classifying a fraction of them as `critical', a refinement step can be
restricted to this subset of the network, thus saving computational time.
Preliminary numerical results are discussed, showing improvement of the final
partition.Comment: 12 page
Boosting Local Search for the Maximum Independent Set Problem
An independent set of a graph G = (V, E) with vertices V and edges E is a subset S ⊆ V, such that the subgraph induced by S does not contain any edges. The goal of the maximum independent set problem (MIS problem) is to find an independent set of maximum size. It is equivalent to the well-known vertex cover problem (VC problem) and maximum clique problem. This thesis consists of two main parts. In the first one we compare the currently best algorithms for finding near-optimal independent sets and vertex covers in large, sparse graphs. They are Iterated Local Search (ILS) by Andrade et al. [2], a heuristic that uses local search for the MIS problem and NuMVC by Cai et al. [6], a local search algorithm for the VC problem. As of now, there are no methods to solve these large instances exactly in any reasonable time. Therefore these heuristic algorithms are the best option. In the second part we analyze a series of techniques, some of which lead to a significant speed up of the ILS algorithm. This is done by removing specific ver
- …