13,950 research outputs found

    Scalable Online Betweenness Centrality in Evolving Graphs

    Full text link
    Betweenness centrality is a classic measure that quantifies the importance of a graph element (vertex or edge) according to the fraction of shortest paths passing through it. This measure is notoriously expensive to compute, and the best known algorithm runs in O(nm) time. The problems of efficiency and scalability are exacerbated in a dynamic setting, where the input is an evolving graph seen edge by edge, and the goal is to keep the betweenness centrality up to date. In this paper we propose the first truly scalable algorithm for online computation of betweenness centrality of both vertices and edges in an evolving graph where new edges are added and existing edges are removed. Our algorithm is carefully engineered with out-of-core techniques and tailored for modern parallel stream processing engines that run on clusters of shared-nothing commodity hardware. Hence, it is amenable to real-world deployment. We experiment on graphs that are two orders of magnitude larger than previous studies. Our method is able to keep the betweenness centrality measures up to date online, i.e., the time to update the measures is smaller than the inter-arrival time between two consecutive updates.Comment: 15 pages, 9 Figures, accepted for publication in IEEE Transactions on Knowledge and Data Engineerin

    Profiling relational data: a survey

    Get PDF
    Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases
    • …
    corecore