183 research outputs found

    Scalable Online Betweenness Centrality in Evolving Graphs

    Full text link
    Betweenness centrality is a classic measure that quantifies the importance of a graph element (vertex or edge) according to the fraction of shortest paths passing through it. This measure is notoriously expensive to compute, and the best known algorithm runs in O(nm) time. The problems of efficiency and scalability are exacerbated in a dynamic setting, where the input is an evolving graph seen edge by edge, and the goal is to keep the betweenness centrality up to date. In this paper we propose the first truly scalable algorithm for online computation of betweenness centrality of both vertices and edges in an evolving graph where new edges are added and existing edges are removed. Our algorithm is carefully engineered with out-of-core techniques and tailored for modern parallel stream processing engines that run on clusters of shared-nothing commodity hardware. Hence, it is amenable to real-world deployment. We experiment on graphs that are two orders of magnitude larger than previous studies. Our method is able to keep the betweenness centrality measures up to date online, i.e., the time to update the measures is smaller than the inter-arrival time between two consecutive updates.Comment: 15 pages, 9 Figures, accepted for publication in IEEE Transactions on Knowledge and Data Engineerin

    Algorithms and Software for the Analysis of Large Complex Networks

    Get PDF
    The work presented intersects three main areas, namely graph algorithmics, network science and applied software engineering. Each computational method discussed relates to one of the main tasks of data analysis: to extract structural features from network data, such as methods for community detection; or to transform network data, such as methods to sparsify a network and reduce its size while keeping essential properties; or to realistically model networks through generative models

    Novel Edge and Density Metrics for Link Cohesion

    Full text link
    We present a new metric of link cohesion for measuring the strength of edges in complex, highly connected graphs. Link cohesion accounts for local small hop connections and associated node degrees and can be used to support edge scoring and graph simplification. We also present a novel graph density measure to estimate the average cohesion across nodes. Link cohesion and the density measure are employed to demonstrate community detection through graph sparsification by maximizing graph density. Link cohesion is also shown to be loosely correlated with edge betweenness centrality

    Domain-specific Architectures for Data-intensive Applications

    Full text link
    Graphs' versatile ability to represent diverse relationships, make them effective for a wide range of applications. For instance, search engines use graph-based applications to provide high-quality search results. Medical centers use them to aid in patient diagnosis. Most recently, graphs are also being employed to support the management of viral pandemics. Looking forward, they are showing promise of being critical in unlocking several other opportunities, including combating the spread of fake content in social networks, detecting and preventing fraudulent online transactions in a timely fashion, and in ensuring collision avoidance in autonomous vehicle navigation, to name a few. Unfortunately, all these applications require more computational power than what can be provided by conventional computing systems. The key reason is that graph applications present large working sets that fail to fit in the small on-chip storage of existing computing systems, while at the same time they access data in seemingly unpredictable patterns, thus cannot draw benefit from traditional on-chip storage. In this dissertation, we set out to address the performance limitations of existing computing systems so to enable emerging graph applications like those described above. To achieve this, we identified three key strategies: 1) specializing memory architecture, 2) processing data near its storage, and 3) message coalescing in the network. Based on these strategies, this dissertation develops several solutions: OMEGA, which employs specialized on-chip storage units, with co-located specialized compute engines to accelerate the computation; MessageFusion, which coalesces messages in the interconnect; and Centaur, providing an architecture that optimizes the processing of infrequently-accessed data. Overall, these solutions provide 2x in performance improvements, with negligible hardware overheads, across a wide range of applications. Finally, we demonstrate the applicability of our strategies to other data-intensive domains, by exploring an acceleration solution for MapReduce applications, which achieves a 4x performance speedup, also with negligible area and power overheads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163186/1/abrahad_1.pd

    HIGH PERFORMANCE DECENTRALISED COMMUNITY DETECTION ALGORITHMS FOR BIG DATA FROM SMART COMMUNICATION APPLICATIONS

    Get PDF
    Many systems in the world can be represented as models of complex networks and subsequently be analysed fruitfully. One fundamental property of the real-world networks is that they usually exhibit inhomogeneity in which the network tends to organise according to an underlying modular structure, commonly referred to as community structure or clustering. Analysing such communities in large networks can help people better understand the structural makeup of the networks. For example, it can be used in mobile ad-hoc and sensor networks to improve the energy consumption and communication tasks. Thus, community detection in networks has become an important research area within many application fields such as computer science, physical sciences, mathematics and biology. Driven by the recent emergence of big data, clustering of real-world networks using traditional methods and algorithms is almost impossible to be processed in a single machine. The existing methods are limited by their computational requirements and most of them cannot be directly parallelised. Furthermore, in many cases the data set is very big and does not fit into the main memory of a single machine, therefore needs to be distributed among several machines. The main topic of this thesis is about network community detection within these big data networks. More specifically, in this thesis, a novel approach, namely Decentralized Iterative Community Clustering Approach (DICCA) for clustering large and undirected networks is introduced. An important property of this approach is its ability to cluster the entire network without the global knowledge of the network topology. Moreover, an extension of the DICCA called Parallel Decentralized Iterative Community Clustering approach (PDICCA) is proposed for efficiently processing data distributed across several machines. PDICCA is based on MapReduce computing platform to work efficiently in distributed and parallel fashion. In addition, the real-world networks are usually noisy and imperfect with missing and false edges. These imperfections are often difficult to eliminate and highly affect the quality and accuracy of conventional methods used to find the community structure in the network. However, in real-world networks, node attribute information is also available in addition to topology information. Considering more than one source of information for community detection could produce meaningful clusters and improve the robustness of the network. Therefore, a pre-processing approach that considers attribute information, shared neighbours and connectivity information aspects of the network for community detection is presented in this thesis as part of my research. Finally, a set of real-world mobile phone usage data obtained from Cambridge Laboratories (Device Analyzer) has been analysed as an exploratory step for viability to apply the algorithms developed in this thesis. All the proposed approaches have been evaluated and verified for feasibility using real-world large data set. The evaluation results of these experimentations prove very promising for the type of large data networks considered
    corecore