17,769 research outputs found
Scaling up Group Closeness Maximization
Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the inverse average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, only recently a greedy algorithm with approximation ratio (1−1/e) has been proposed [Chen et al., ADC 2016]. Since this algorithm’s running time is still expensive for large networks, a heuristic without approximation guarantee has also been proposed in the same paper.
In the present paper we develop new techniques to speed up the greedy algorithm without losing its theoretical guarantee. Compared to a straightforward implementation, our approach is orders of magnitude faster and, compared to the heuristic proposed by Chen et al., we always find a solution with better quality in a comparable running time in our experiments.
Our method Greedy++ allows us to approximate the group with maximum closeness on networks with up to hundreds of millions of edges in minutes or at most a few hours. To have the same theoretical guarantee, the greedy approach by [Chen et al., ADC 2016] would take several days already on networks with hundreds of thousands of edges.
In a comparison with the optimum, our experiments show that the solution found by Greedy++ is actually much better than the theoretical guarantee. Over all tested networks, the empirical approximation ratio is never lower than 0.97.
Finally, we study for the first time the correlation between the top-k nodes with highest closeness and an approximation of the most central group in large complex networks and show that the overlap between the two is relatively small
Towards Scalable Network Delay Minimization
Reduction of end-to-end network delays is an optimization task with
applications in multiple domains. Low delays enable improved information flow
in social networks, quick spread of ideas in collaboration networks, low travel
times for vehicles on road networks and increased rate of packets in the case
of communication networks. Delay reduction can be achieved by both improving
the propagation capabilities of individual nodes and adding additional edges in
the network. One of the main challenges in such design problems is that the
effects of local changes are not independent, and as a consequence, there is a
combinatorial search-space of possible improvements. Thus, minimizing the
cumulative propagation delay requires novel scalable and data-driven
approaches.
In this paper, we consider the problem of network delay minimization via node
upgrades. Although the problem is NP-hard, we show that probabilistic
approximation for a restricted version can be obtained. We design scalable and
high-quality techniques for the general setting based on sampling and targeted
to different models of delay distribution. Our methods scale almost linearly
with the graph size and consistently outperform competitors in quality
Fast Shortest Path Distance Estimation in Large Networks
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications.
In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks.
We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random.
Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship
Efficient Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs
Graphs are an important tool to model data in different domains, including
social networks, bioinformatics and the world wide web. Most of the networks
formed in these domains are directed graphs, where all the edges have a
direction and they are not symmetric. Betweenness centrality is an important
index widely used to analyze networks. In this paper, first given a directed
network and a vertex , we propose a new exact algorithm to
compute betweenness score of . Our algorithm pre-computes a set
, which is used to prune a huge amount of computations that do
not contribute in the betweenness score of . Time complexity of our exact
algorithm depends on and it is respectively
and
for unweighted graphs and weighted graphs with positive weights.
is bounded from above by and in most cases, it
is a small constant. Then, for the cases where is large, we
present a simple randomized algorithm that samples from and
performs computations for only the sampled elements. We show that this
algorithm provides an -approximation of the betweenness
score of . Finally, we perform extensive experiments over several real-world
datasets from different domains for several randomly chosen vertices as well as
for the vertices with the highest betweenness scores. Our experiments reveal
that in most cases, our algorithm significantly outperforms the most efficient
existing randomized algorithms, in terms of both running time and accuracy. Our
experiments also show that our proposed algorithm computes betweenness scores
of all vertices in the sets of sizes 5, 10 and 15, much faster and more
accurate than the most efficient existing algorithms.Comment: arXiv admin note: text overlap with arXiv:1704.0735
Fully-dynamic Approximation of Betweenness Centrality
Betweenness is a well-known centrality measure that ranks the nodes of a
network according to their participation in shortest paths. Since an exact
computation is prohibitive in large networks, several approximation algorithms
have been proposed. Besides that, recent years have seen the publication of
dynamic algorithms for efficient recomputation of betweenness in evolving
networks. In previous work we proposed the first semi-dynamic algorithms that
recompute an approximation of betweenness in connected graphs after batches of
edge insertions.
In this paper we propose the first fully-dynamic approximation algorithms
(for weighted and unweighted undirected graphs that need not to be connected)
with a provable guarantee on the maximum approximation error. The transfer to
fully-dynamic and disconnected graphs implies additional algorithmic problems
that could be of independent interest. In particular, we propose a new upper
bound on the vertex diameter for weighted undirected graphs. For both weighted
and unweighted graphs, we also propose the first fully-dynamic algorithms that
keep track of such upper bound. In addition, we extend our former algorithm for
semi-dynamic BFS to batches of both edge insertions and deletions.
Using approximation, our algorithms are the first to make in-memory
computation of betweenness in fully-dynamic networks with millions of edges
feasible. Our experiments show that they can achieve substantial speedups
compared to recomputation, up to several orders of magnitude
The Minimum Wiener Connector
The Wiener index of a graph is the sum of all pairwise shortest-path
distances between its vertices. In this paper we study the novel problem of
finding a minimum Wiener connector: given a connected graph and a set
of query vertices, find a subgraph of that connects all
query vertices and has minimum Wiener index.
We show that The Minimum Wiener Connector admits a polynomial-time (albeit
impractical) exact algorithm for the special case where the number of query
vertices is bounded. We show that in general the problem is NP-hard, and has no
PTAS unless . Our main contribution is a
constant-factor approximation algorithm running in time
.
A thorough experimentation on a large variety of real-world graphs confirms
that our method returns smaller and denser solutions than other methods, and
does so by adding to the query set a small number of important vertices
(i.e., vertices with high centrality).Comment: Published in Proceedings of the 2015 ACM SIGMOD International
Conference on Management of Dat
- …