167 research outputs found

    A Divide-and-Conquer Algorithm for Betweenness Centrality

    Full text link
    The problem of efficiently computing the betweenness centrality of nodes has been researched extensively. To date, the best known exact and centralized algorithm for this task is an algorithm proposed in 2001 by Brandes. The contribution of our paper is Brandes++, an algorithm for exact efficient computation of betweenness centrality. The crux of our algorithm is that we create a sketch of the graph, that we call the skeleton, by replacing subgraphs with simpler graph structures. Depending on the underlying graph structure, using this skeleton and by keeping appropriate summaries Brandes++ we can achieve significantly low running times in our computations. Extensive experimental evaluation on real life datasets demonstrate the efficacy of our algorithm for different types of graphs. We release our code for benefit of the research community.Comment: Shorter version of this paper appeared in Siam Data Mining 201

    Centrality Heuristics for Exact Model Counting

    Get PDF
    Model counting is the archetypical #P-complete problem consisting of determining the number of satisfying truth assignments of a given propositional formula. In this short paper, we empirically investigate the potential of employing graph centrality measures as a basis of search heuristics in the context of exact model counting. In particular, we integrate centrality-based heuristics into the search-based exact model counter sharpSAT. Our experiments show that employing centrality information significantly improves the empirical performance of sharpSAT, and also allows for simplifying the search heuristics compared to the current default heuristics of the model counter. In particular, we show that the VSIDS heuristic, which is an integral search heuristic employed in essentially all state-of-the-art conflict-driven clause learning Boolean satisfiability solvers, appears to be of very limited use in the context of model counting.Peer reviewe

    KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation

    Get PDF
    We present KADABRA, a new algorithm to approximate betweenness centrality in directed and undirected graphs, which significantly outperforms all previous approaches on real-world complex networks. The efficiency of the new algorithm relies on two new theoretical contributions, of independent interest. The first contribution focuses on sampling shortest paths, a subroutine used by most algorithms that approximate betweenness centrality. We show that, on realistic random graph models, we can perform this task in time ∣E∣12+o(1)|E|^{\frac{1}{2}+o(1)} with high probability, obtaining a significant speedup with respect to the Θ(∣E∣)\Theta(|E|) worst-case performance. We experimentally show that this new technique achieves similar speedups on real-world complex networks, as well. The second contribution is a new rigorous application of the adaptive sampling technique. This approach decreases the total number of shortest paths that need to be sampled to compute all betweenness centralities with a given absolute error, and it also handles more general problems, such as computing the kk most central nodes. Furthermore, our analysis is general, and it might be extended to other settings.Comment: Some typos correcte

    Exact Distributed Load Centrality Computation: Algorithms, Convergence, and Applications to Distance Vector Routing

    Get PDF
    Many optimization techniques for networking protocols take advantage of topological information to improve performance. Often, the topological information at the core of these techniques is a centrality metric such as the Betweenness Centrality (BC) index. BC is, in fact, a centrality metric with many well-known successful applications documented in the literature, from resource allocation to routing. To compute BC, however, each node must run a centralized algorithm and needs to have the global topological knowledge; such requirements limit the feasibility of optimization procedures based on BC. To overcome restrictions of this kind, we present a novel distributed algorithm that requires only local information to compute an alternative similar metric, called Load Centrality (LC). We present the new algorithm together with a proof of its convergence and the analysis of its time complexity. The proposed algorithm is general enough to be integrated with any distance vector (DV) routing protocol. In support of this claim, we provide an implementation on top of Babel, a real-world DV protocol. We use this implementation in an emulation framework to show how LC can be exploited to reduce Babel's convergence time upon node failure, without increasing control overhead. As a key step towards the adoption of centrality-based optimization for routing, we study how the algorithm can be incrementally introduced in a network running a DV routing protocol. We show that even when only a small fraction of nodes participate in the protocol, the algorithm accurately ranks nodes according to their centrality

    Centrality measures and analyzing dot-product graphs

    Full text link
    In this thesis we investigate two topics in data mining on graphs; in the first part we investigate the notion of centrality in graphs, in the second part we look at reconstructing graphs from aggregate information. In many graph related problems the goal is to rank nodes based on an importance score. This score is in general referred to as node centrality. In Part I. we start by giving a novel and more efficient algorithm for computing betweenness centrality. In many applications not an individual node but rather a set of nodes is chosen to perform some task. We generalize the notion of centrality to groups of nodes. While group centrality was first formally defined by Everett and Borgatti (1999), we are the first to pose it as a combinatorial optimization problem; find a group of k nodes with largest centrality. We give an algorithm for solving this optimization problem for a general notion of centrality that subsumes various instantiations of centrality that find paths in the graph. We prove that this problem is NP-hard for specific centrality definitions and we provide a universal algorithm for this problem that can be modified to optimize the specific measures. We also investigate the problem of increasing node centrality by adding or deleting edges in the graph. We conclude this part by solving the optimization problem for two specific applications; one for minimizing redundancy in information propagation networks and one for optimizing the expected number of interceptions of a group in a random navigational network. In the second part of the thesis we investigate what we can infer about a bipartite graph if only some aggregate information -- the number of common neighbors among each pair of nodes -- is given. First, we observe that the given data is equivalent to the dot-product of the adjacency vectors of each node. Based on this knowledge we develop an algorithm that is based on SVD-decomposition, that is capable of almost perfectly reconstructing graphs from such neighborhood data. We investigate two versions of this problem, in the versions the dot-product of nodes with themselves, e.g. the node degrees, are either known or hidden

    Boosting Local Search for the Maximum Independent Set Problem

    Get PDF
    An independent set of a graph G = (V, E) with vertices V and edges E is a subset S ⊆ V, such that the subgraph induced by S does not contain any edges. The goal of the maximum independent set problem (MIS problem) is to find an independent set of maximum size. It is equivalent to the well-known vertex cover problem (VC problem) and maximum clique problem. This thesis consists of two main parts. In the first one we compare the currently best algorithms for finding near-optimal independent sets and vertex covers in large, sparse graphs. They are Iterated Local Search (ILS) by Andrade et al. [2], a heuristic that uses local search for the MIS problem and NuMVC by Cai et al. [6], a local search algorithm for the VC problem. As of now, there are no methods to solve these large instances exactly in any reasonable time. Therefore these heuristic algorithms are the best option. In the second part we analyze a series of techniques, some of which lead to a significant speed up of the ILS algorithm. This is done by removing specific ver

    Graph manipulations for fast centrality computation

    Get PDF
    The betweenness and closeness metrics have always been intriguing and used in many analyses. Yet, they are expensive to compute. For that reason, making the betweenness and closeness centrality computations faster is an important and well-studied problem. In this work, we propose the framework, BADIOS, which manipulates the graph by compressing it and splitting into pieces so that the centrality computation can be handled independently for each piece. Although BADIOS is designed and fine-tuned for exact betweenness and closeness centrality, it can easily be adapted for approximate solutions as well. Experimental results show that the proposed techniques can be a great arsenal to reduce the centrality computation time for various types and sizes of networks. In particular, it reduces the betweenness centrality computation time of a 4.6 million edges graph from more than 5 days to less than 16 hours. For the same graph, we achieve to decrease the closeness computation time from more than 3 days to 6 hours (12.7x speedup)

    Finding important entities in graphs

    Get PDF
    Graphs are established as one of the most prominent means of data representation. They are composed of simple entities -- nodes and edges -- and reflect the relationship between them. Their impact extends to a broad variety of domains, e.g., biology, sociology and the Web. In these settings, much of the data value can be captured by a simple question; how can we evaluate the importance of these entities? The aim of this dissertation is to explore novel importance measures that are meaningful and can be computed efficiently on large datasets. First, we focus on the spanning edge centrality, an edge importance measure recently introduced to evaluate phylogenetic trees. We propose very efficient methods that approximate this measure in near-linear time and apply them to large graphs with millions of nodes. We demonstrate that this centrality measure is a useful tool for the analysis of networks outside its original application domain. Next, we turn to importance measures for nodes and propose the absorbing random walk centrality. This measure evaluates a group of nodes in a graph according to how central they are with respect to a set of query nodes. Specifically, given a query set and a candidate group of nodes, we start random walks from the queries and measure their length until they reach one of the candidates. The most central group of nodes will collectively minimize the expected length of these random walks. We prove several computational properties of this measure and provide an algorithm, whose solutions offer an approximation guarantee. Additionally, we develop efficient heuristics that allow us to use this importance measure in large datasets. Finally, we consider graphs in which each node is assigned a set of attributes. We define an important connected subgraph to be one for which the total weight of its edges is small, while the number of attributes covered by its nodes is large. To select such an important subgraph, we develop an efficient approximation algorithm based on the primal-dual schema

    Domino D5.3 Final tool and model description, and case studies results

    Get PDF
    This deliverable presents the final results obtained from the Domino project. It presents the corresponding metrics, the model, and a detailed analysis of two case studies. The main modifications to the model with respect to the previous version are highlighted, including curfew management. The calibration of the model is presented, which is similar to the previous version, with more in-depth analyses and further effort dedicated to the calibration process. Two case studies are defined in this deliverable, using previous definitions of the three base mechanisms: 4D trajectory adjustments, flight prioritisation, and flight arrival coordination. The case studies are defined to have a focused insight into the efficiency of the mechanisms in specific environments. The two case studies are run by the model and analysed using metrics previously defined, including centrality and causality metrics. The results show different levels of efficiency for the three mechanisms, highlight the degree of robustness to the propagation of negative effects (such as delay) in the system, demonstrate various trade-offs between the indicators, and support a discussion of the limit of the mechanisms

    09491 Abstracts Collection -- Graph Search Engineering

    Get PDF
    From the 29th November to the 4th December 2009, the Dagstuhl Seminar 09491 ``Graph Search Engineering \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available
    • …
    corecore