1,112 research outputs found

    A Faster Method to Estimate Closeness Centrality Ranking

    Get PDF
    Closeness centrality is one way of measuring how central a node is in the given network. The closeness centrality measure assigns a centrality value to each node based on its accessibility to the whole network. In real life applications, we are mainly interested in ranking nodes based on their centrality values. The classical method to compute the rank of a node first computes the closeness centrality of all nodes and then compares them to get its rank. Its time complexity is O(n⋅m+n)O(n \cdot m + n), where nn represents total number of nodes, and mm represents total number of edges in the network. In the present work, we propose a heuristic method to fast estimate the closeness rank of a node in O(α⋅m)O(\alpha \cdot m) time complexity, where α=3\alpha = 3. We also propose an extended improved method using uniform sampling technique. This method better estimates the rank and it has the time complexity O(α⋅m)O(\alpha \cdot m), where α≈10−100\alpha \approx 10-100. This is an excellent improvement over the classical centrality ranking method. The efficiency of the proposed methods is verified on real world scale-free social networks using absolute and weighted error functions


    Get PDF
    My dissertation focuses on developing scalable algorithms for analyzing large complex networks and evaluating how the results alter with changes to the network. Network analysis has become a ubiquitous and very effective tool in big data analysis, particularly for understanding the mechanisms of complex systems that arise in diverse disciplines such as cybersecurity [83], biology [15], sociology [5], and epidemiology [7]. However, data from real-world systems are inherently noisy because they are influenced by fluctuations in experiments, subjective interpretation of data, and limitation of computing resources. Therefore, the corresponding networks are also approximate. This research addresses these issues of obtaining accurate results from large noisy networks efficiently. My dissertation has four main components. The first component consists of developing efficient and scalable algorithms for centrality computations that produce reliable results on noisy networks. Two novel contributions I made in this area are the development of a group testing [16] based algorithm for identification of high centrality vertices which is extremely faster than current methods, and an algorithm for computing the betweenness centrality of a specific vertex. The second component consists of developing quantitative metrics to measure how different noise models affect the analysis results. We implemented a uniform perturbation model based on random addition/ deletion of edges of a network. To quantify the stability of a network we investigated the effect that perturbations have on the top-k ranked vertices and the local structure properties of the top ranked vertices. The third component consists of developing efficient software for network analysis. I have been part of the development of a software package, ESSENS (Extensible, Scalable Software for Evolving NetworkS) [76], that effectively supports our algorithms on large networks. The fourth component is a literature review of the various noise models that researchers have applied to networks and the methods they have used to quantify the stability, sensitivity, robustness, and reliability of networks. These four aspects together will lead to efficient, accurate, and highly scalable algorithms for analyzing noisy networks

    Distance-generalized Core Decomposition

    Full text link
    The kk-core of a graph is defined as the maximal subgraph in which every vertex is connected to at least kk other vertices within that subgraph. In this work we introduce a distance-based generalization of the notion of kk-core, which we refer to as the (k,h)(k,h)-core, i.e., the maximal subgraph in which every vertex has at least kk other vertices at distance ≤h\leq h within that subgraph. We study the properties of the (k,h)(k,h)-core showing that it preserves many of the nice features of the classic core decomposition (e.g., its connection with the notion of distance-generalized chromatic number) and it preserves its usefulness to speed-up or approximate distance-generalized notions of dense structures, such as hh-club. Computing the distance-generalized core decomposition over large networks is intrinsically complex. However, by exploiting clever upper and lower bounds we can partition the computation in a set of totally independent subcomputations, opening the door to top-down exploration and to multithreading, and thus achieving an efficient algorithm

    Closeness Centrality Algorithms For Multilayer Networks

    Full text link
    Centrality measures for simple graphs are well-defined and several main-memory algorithms exist for each. Simple graphs are not adequate for modeling complex data sets with multiple entities and relationships. Multilayer networks (MLNs) have been shown to be better suited, but there are very few algorithms for centrality computation directly on MLNs. They are converted (aggregated or collapsed) to simple graphs using Boolean AND or OR operators to compute centrality, which is not only inefficient but incurs a loss of structure and semantics. In this paper, we propose algorithms that compute closeness centrality on an MLN directly using a novel decoupling-based approach. Individual results of layers (or simple graphs) of an MLN are used and a composition function developed to compute the centrality for the MLN. The challenge is to do this accurately and efficiently. However, since these algorithms do not have complete information of the MLN, computing a global measure such as closeness centrality is a challenge. Hence, these algorithms rely on heuristics derived from intuition. The advantage is that this approach lends itself to parallelism and is more efficient compared to the traditional approach. We present two heuristics for composition and experimentally validate accuracy and efficiency on a large number of synthetic and real-world graphs with diverse characteristics

    Analysis of Protein-Protein Interaction Networks Using High Performance Scalable Tools

    Get PDF
    Protein-Protein Interaction (PPI) Research currently generates an extraordinary amount of publications and interest in fellow computer scientists and biologists alike because of the underlying potential of the source material that researchers can work with. PPI networks are the networks of protein complexes formed by biochemical events or electrostatic forces serving a biological function [1]. Since the analysis of the protein networks is now growing, we have more information regarding protein, genomes and their influence on life. Today, PPI networks are used to study diseases, improve drugs and understand other processes in medicine and health that will eventually help mankind. Though PPI network research is considered extremely important in the field, there is an issue – we do not have enough people who have enough interdisciplinary knowledge in both the fields of biology and computer science; this limits our rate of progress in the field. Most biologists that are not expert coders need a way of calculating graph values and information that will help them analyze the graphs better without having to manipulate the data themselves. In this research, I test a few ways of achieving results through the use of available frameworks and algorithms, present the results and compare each method’s efficacy. My analysis takes place on very large datasets where I calculate several centralities and other data from the graph using different metrics, and I also visualize them in order to gain further insight. I also managed to note the significance of MPI and multithreading on the results thus obtained that suggest building scalable tools will help improve the analysis immensely
    • …