2,180 research outputs found

    Fast exact computation of betweenness centrality in social networks

    Get PDF
    Abstract-Social networks have demonstrated in the last few years to be a powerful and flexible concept useful to represent and analyze data. They borrow some basic concepts from sociology in order to model how people (or data items) establish relationships with each other. The study of these relationships can provide a deeper understanding of many emergent global phenomena. The amount of data available in the form of social networks data is growing by the day, and this poses many computational challenging problems for their analysis. In fact many analysis tools suitable to analyze small to medium sized networks are inefficient for large social networks. In this paper we present a novel approach for the computation of the betweenness centrality, which speeds up considerably Brandes\u27 algorithm, in the context of social networking. Our algorithm exploits the natural sparsity of the data to algebraically (and efficiently) determine the betweenness of those nodes organized as trees embedded in the social network. Moreover, for the residual network, which is often of much smaller size we modify the Brandes\u27 algorithm so that we can remove the nodes already processed and perform the computation of the shortest paths only for the remaining nodes. We tested our algorithm using a set of 18 real sparse large social networks provided by Sistemi Territoriali which is an Italian ICT company specialized in Business Intelligence. Our tests show that our algorithm consistently runs more than an order of magnitude faster than the Brandes\u27 procedure on such sparse networks

    Efficient Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs

    Full text link
    Graphs are an important tool to model data in different domains, including social networks, bioinformatics and the world wide web. Most of the networks formed in these domains are directed graphs, where all the edges have a direction and they are not symmetric. Betweenness centrality is an important index widely used to analyze networks. In this paper, first given a directed network GG and a vertex rV(G)r \in V(G), we propose a new exact algorithm to compute betweenness score of rr. Our algorithm pre-computes a set RV(r)\mathcal{RV}(r), which is used to prune a huge amount of computations that do not contribute in the betweenness score of rr. Time complexity of our exact algorithm depends on RV(r)|\mathcal{RV}(r)| and it is respectively Θ(RV(r)E(G))\Theta(|\mathcal{RV}(r)|\cdot|E(G)|) and Θ(RV(r)E(G)+RV(r)V(G)logV(G))\Theta(|\mathcal{RV}(r)|\cdot|E(G)|+|\mathcal{RV}(r)|\cdot|V(G)|\log |V(G)|) for unweighted graphs and weighted graphs with positive weights. RV(r)|\mathcal{RV}(r)| is bounded from above by V(G)1|V(G)|-1 and in most cases, it is a small constant. Then, for the cases where RV(r)\mathcal{RV}(r) is large, we present a simple randomized algorithm that samples from RV(r)\mathcal{RV}(r) and performs computations for only the sampled elements. We show that this algorithm provides an (ϵ,δ)(\epsilon,\delta)-approximation of the betweenness score of rr. Finally, we perform extensive experiments over several real-world datasets from different domains for several randomly chosen vertices as well as for the vertices with the highest betweenness scores. Our experiments reveal that in most cases, our algorithm significantly outperforms the most efficient existing randomized algorithms, in terms of both running time and accuracy. Our experiments also show that our proposed algorithm computes betweenness scores of all vertices in the sets of sizes 5, 10 and 15, much faster and more accurate than the most efficient existing algorithms.Comment: arXiv admin note: text overlap with arXiv:1704.0735

    Computing Vertex Centrality Measures in Massive Real Networks with a Neural Learning Model

    Full text link
    Vertex centrality measures are a multi-purpose analysis tool, commonly used in many application environments to retrieve information and unveil knowledge from the graphs and network structural properties. However, the algorithms of such metrics are expensive in terms of computational resources when running real-time applications or massive real world networks. Thus, approximation techniques have been developed and used to compute the measures in such scenarios. In this paper, we demonstrate and analyze the use of neural network learning algorithms to tackle such task and compare their performance in terms of solution quality and computation time with other techniques from the literature. Our work offers several contributions. We highlight both the pros and cons of approximating centralities though neural learning. By empirical means and statistics, we then show that the regression model generated with a feedforward neural networks trained by the Levenberg-Marquardt algorithm is not only the best option considering computational resources, but also achieves the best solution quality for relevant applications and large-scale networks. Keywords: Vertex Centrality Measures, Neural Networks, Complex Network Models, Machine Learning, Regression ModelComment: 8 pages, 5 tables, 2 figures, version accepted at IJCNN 2018. arXiv admin note: text overlap with arXiv:1810.1176

    Fast Shortest Path Distance Estimation in Large Networks

    Full text link
    We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications. In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random. Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship

    Discriminative Distance-Based Network Indices with Application to Link Prediction

    Full text link
    In large networks, using the length of shortest paths as the distance measure has shortcomings. A well-studied shortcoming is that extending it to disconnected graphs and directed graphs is controversial. The second shortcoming is that a huge number of vertices may have exactly the same score. The third shortcoming is that in many applications, the distance between two vertices not only depends on the length of shortest paths, but also on the number of shortest paths. In this paper, first we develop a new distance measure between vertices of a graph that yields discriminative distance-based centrality indices. This measure is proportional to the length of shortest paths and inversely proportional to the number of shortest paths. We present algorithms for exact computation of the proposed discriminative indices. Second, we develop randomized algorithms that precisely estimate average discriminative path length and average discriminative eccentricity and show that they give (ϵ,δ)(\epsilon,\delta)-approximations of these indices. Third, we perform extensive experiments over several real-world networks from different domains. In our experiments, we first show that compared to the traditional indices, discriminative indices have usually much more discriminability. Then, we show that our randomized algorithms can very precisely estimate average discriminative path length and average discriminative eccentricity, using only few samples. Then, we show that real-world networks have usually a tiny average discriminative path length, bounded by a constant (e.g., 2). Fourth, in order to better motivate the usefulness of our proposed distance measure, we present a novel link prediction method, that uses discriminative distance to decide which vertices are more likely to form a link in future, and show its superior performance compared to the well-known existing measures
    corecore