1,222 research outputs found

    A Divide-and-Conquer Algorithm for Betweenness Centrality

    Full text link
    The problem of efficiently computing the betweenness centrality of nodes has been researched extensively. To date, the best known exact and centralized algorithm for this task is an algorithm proposed in 2001 by Brandes. The contribution of our paper is Brandes++, an algorithm for exact efficient computation of betweenness centrality. The crux of our algorithm is that we create a sketch of the graph, that we call the skeleton, by replacing subgraphs with simpler graph structures. Depending on the underlying graph structure, using this skeleton and by keeping appropriate summaries Brandes++ we can achieve significantly low running times in our computations. Extensive experimental evaluation on real life datasets demonstrate the efficacy of our algorithm for different types of graphs. We release our code for benefit of the research community.Comment: Shorter version of this paper appeared in Siam Data Mining 201

    Discriminative Distance-Based Network Indices with Application to Link Prediction

    Full text link
    In large networks, using the length of shortest paths as the distance measure has shortcomings. A well-studied shortcoming is that extending it to disconnected graphs and directed graphs is controversial. The second shortcoming is that a huge number of vertices may have exactly the same score. The third shortcoming is that in many applications, the distance between two vertices not only depends on the length of shortest paths, but also on the number of shortest paths. In this paper, first we develop a new distance measure between vertices of a graph that yields discriminative distance-based centrality indices. This measure is proportional to the length of shortest paths and inversely proportional to the number of shortest paths. We present algorithms for exact computation of the proposed discriminative indices. Second, we develop randomized algorithms that precisely estimate average discriminative path length and average discriminative eccentricity and show that they give (ϵ,δ)(\epsilon,\delta)-approximations of these indices. Third, we perform extensive experiments over several real-world networks from different domains. In our experiments, we first show that compared to the traditional indices, discriminative indices have usually much more discriminability. Then, we show that our randomized algorithms can very precisely estimate average discriminative path length and average discriminative eccentricity, using only few samples. Then, we show that real-world networks have usually a tiny average discriminative path length, bounded by a constant (e.g., 2). Fourth, in order to better motivate the usefulness of our proposed distance measure, we present a novel link prediction method, that uses discriminative distance to decide which vertices are more likely to form a link in future, and show its superior performance compared to the well-known existing measures

    The Parameterized Complexity of Centrality Improvement in Networks

    Full text link
    The centrality of a vertex v in a network intuitively captures how important v is for communication in the network. The task of improving the centrality of a vertex has many applications, as a higher centrality often implies a larger impact on the network or less transportation or administration cost. In this work we study the parameterized complexity of the NP-complete problems Closeness Improvement and Betweenness Improvement in which we ask to improve a given vertex' closeness or betweenness centrality by a given amount through adding a given number of edges to the network. Herein, the closeness of a vertex v sums the multiplicative inverses of distances of other vertices to v and the betweenness sums for each pair of vertices the fraction of shortest paths going through v. Unfortunately, for the natural parameter "number of edges to add" we obtain hardness results, even in rather restricted cases. On the positive side, we also give an island of tractability for the parameter measuring the vertex deletion distance to cluster graphs

    Fast Shortest Path Distance Estimation in Large Networks

    Full text link
    We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications. In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random. Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    Betweenness versus Linerank

    Get PDF

    Considerations about multistep community detection

    Full text link
    The problem and implications of community detection in networks have raised a huge attention, for its important applications in both natural and social sciences. A number of algorithms has been developed to solve this problem, addressing either speed optimization or the quality of the partitions calculated. In this paper we propose a multi-step procedure bridging the fastest, but less accurate algorithms (coarse clustering), with the slowest, most effective ones (refinement). By adopting heuristic ranking of the nodes, and classifying a fraction of them as `critical', a refinement step can be restricted to this subset of the network, thus saving computational time. Preliminary numerical results are discussed, showing improvement of the final partition.Comment: 12 page

    Boosting Local Search for the Maximum Independent Set Problem

    Get PDF
    An independent set of a graph G = (V, E) with vertices V and edges E is a subset S ⊆ V, such that the subgraph induced by S does not contain any edges. The goal of the maximum independent set problem (MIS problem) is to find an independent set of maximum size. It is equivalent to the well-known vertex cover problem (VC problem) and maximum clique problem. This thesis consists of two main parts. In the first one we compare the currently best algorithms for finding near-optimal independent sets and vertex covers in large, sparse graphs. They are Iterated Local Search (ILS) by Andrade et al. [2], a heuristic that uses local search for the MIS problem and NuMVC by Cai et al. [6], a local search algorithm for the VC problem. As of now, there are no methods to solve these large instances exactly in any reasonable time. Therefore these heuristic algorithms are the best option. In the second part we analyze a series of techniques, some of which lead to a significant speed up of the ILS algorithm. This is done by removing specific ver
    • …
    corecore