Search CORE

454 research outputs found

Graph Summarization

Author: Bonifati Angela
Dumbrava Stefania
Kondylakis Haridimos
Publication venue
Publication date: 01/04/2020
Field of study

The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Balancing Summarization and Change Detection in Graph Streams

Author: Fukushima Shintaro
Yamanishi Kenji
Publication venue
Publication date: 12/12/2023
Field of study

This study addresses the issue of balancing graph summarization and graph change detection. Graph summarization compresses large-scale graphs into a smaller scale. However, the question remains: To what extent should the original graph be compressed? This problem is solved from the perspective of graph change detection, aiming to detect statistically significant changes using a stream of summary graphs. If the compression rate is extremely high, important changes can be ignored, whereas if the compression rate is extremely low, false alarms may increase with more memory. This implies that there is a trade-off between compression rate in graph summarization and accuracy in change detection. We propose a novel quantitative methodology to balance this trade-off to simultaneously realize reliable graph summarization and change detection. We introduce a probabilistic structure of hierarchical latent variable model into a graph, thereby designing a parameterized summary graph on the basis of the minimum description length principle. The parameter specifying the summary graph is then optimized so that the accuracy of change detection is guaranteed to suppress Type I error probability (probability of raising false alarms) to be less than a given confidence level. First, we provide a theoretical framework for connecting graph summarization with change detection. Then, we empirically demonstrate its effectiveness on synthetic and real datasets.Comment: 6 pages, Accepted to 23rd IEEE International Conference on Data Mining (ICDM2023

arXiv.org e-Print Archive

Promise and Limitations of Supervised Optimal Transport-Based Graph Summarization via Information Theoretic Measures

Author: Neshatfar Sepideh
Publication venue: DigitalCommons@UMaine
Publication date: 15/12/2023
Field of study

Graph summarization is a fundamental problem in the field of data analysis, aiming to distill extensive graph datasets into more manageable, yet informative representations. The challenge lies in creating compressed graphs that faithfully retain crucial structural information for downstream tasks. A recent advancement in this domain introduces an optimal transport-based framework that enables the incorporation of a priori knowledge regarding the importance of nodes, edges, and attributes during the graph summarization process. However, the statistical properties of this innovative framework remain largely unexplored. This master\u27s thesis embarks on a comprehensive exploration of the field of graph summarization, with a particular focus on supervised graph summarization. In this context, the goal is not only to reduce the graph size but also to do so while preserving information essential for a specific class label. We employ information theoretic measures to quantify the preservation of such relevant information. To establish a robust theoretical foundation for supervised graph summarization, we frame the problem as the maximization of Shannon mutual information between the summarized graph and the associated class label. Strikingly, we prove that this problem is NP-hard to approximate, a finding that sets clear bounds on the expectations for any proposed solutions. To address this theoretical challenge, we introduce an innovative summarization method that integrates mutual information estimates. These estimates capture intricate relationships between random variables associated with sample graphs and class labels, seamlessly integrated into the optimal transport compression framework. Through a series of empirical experiments, we demonstrate the practical efficacy of our proposed method. Our results highlight significant improvements in terms of classification accuracy and computational efficiency, surpassing the performance of prior approaches. We validate our findings on both synthetic datasets and certain real-world scenarios. Beyond the empirical evaluations, this thesis delves into a deep theoretical analysis of the limitations of the optimal transport framework in the context of supervised graph summarization. We reveal that this approach fails to meet a critical information monotonicity property, shedding light on its practical and theoretical constraints. In conclusion, this master\u27s thesis makes significant contributions to the burgeoning field of supervised graph summarization. It offers novel insights into the statistical properties of an emerging optimal transport-based framework, proposing a solution that unifies information theory with optimal transport. The work extends the boundaries of what is achievable in supervised graph summarization, providing practical enhancements and theoretical perspectives that can be applied across diverse application domains

University of Maine

Incremental Lossless Graph Summarization

Author: Ko Jihoon
Kook Yunbum
Shin Kijung
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/06/2020
Field of study

Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice. In this work, we propose MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. In response to each change in the input graph, MoSSo updates the output representation by repeatedly moving nodes among supernodes. MoSSo decides nodes to be moved and their destinations carefully but rapidly based on several novel ideas. Through extensive experiments on 10 real graphs, we show MoSSo is (a) Fast and 'any time': processing each change in near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude faster than running state-of-the-art batch methods, (b) Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process, and (c) Effective: achieving comparable compression ratios even to state-of-the-art batch methods.Comment: to appear at the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '20

arXiv.org e-Print Archive

Crossref

Flow-based Influence Graph Visual Summarization

Author: Lin Chuang
Shi Lei
Tang Jie
Tong Hanghang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2014
Field of study

Visually mining a large influence graph is appealing yet challenging. People are amazed by pictures of newscasting graph on Twitter, engaged by hidden citation networks in academics, nevertheless often troubled by the unpleasant readability of the underlying visualization. Existing summarization methods enhance the graph visualization with blocked views, but have adverse effect on the latent influence structure. How can we visually summarize a large graph to maximize influence flows? In particular, how can we illustrate the impact of an individual node through the summarization? Can we maintain the appealing graph metaphor while preserving both the overall influence pattern and fine readability? To answer these questions, we first formally define the influence graph summarization problem. Second, we propose an end-to-end framework to solve the new problem. Our method can not only highlight the flow-based influence patterns in the visual summarization, but also inherently support rich graph attributes. Last, we present a theoretic analysis and report our experiment results. Both evidences demonstrate that our framework can effectively approximate the proposed influence graph summarization objective while outperforming previous methods in a typical scenario of visually mining academic citation networks.Comment: to appear in IEEE International Conference on Data Mining (ICDM), Shen Zhen, China, December 201

arXiv.org e-Print Archive

Crossref

A Neighborhood-preserving Graph Summarization

Author: Baste Julien
Haddad Mohammed
Kiouche Abd Errahmane
Seba Hamida
Publication venue
Publication date: 27/01/2021
Field of study

We introduce in this paper a new summarization method for large graphs. Our summarization approach retains only a user-specified proportion of the neighbors of each node in the graph. Our main aim is to simplify large graphs so that they can be analyzed and processed effectively while preserving as many of the node neighborhood properties as possible. Since many graph algorithms are based on the neighborhood information available for each node, the idea is to produce a smaller graph which can be used to allow these algorithms to handle large graphs and run faster while providing good approximations. Moreover, our compression allows users to control the size of the compressed graph by adjusting the amount of information loss that can be tolerated. The experiments conducted on various real and synthetic graphs show that our compression reduces considerably the size of the graphs. Moreover, we conducted several experiments on the obtained summaries using various graph algorithms and applications, such as node embedding, graph classification and shortest path approximations. The obtained results show interesting trade-offs between the algorithms runtime speed-up and the precision loss.Comment: 17 pages, 10 figure

arXiv.org e-Print Archive

HAL

Hal-Diderot

Dynamic Discovery of Type Classes and Relations in Semantic Web Data

Author: Ayvaz Serkan
Aydar Mehmet
Publication venue
Publication date: 31/05/2017
Field of study

The continuing development of Semantic Web technologies and the increasing user adoption in the recent years have accelerated the progress incorporating explicit semantics with data on the Web. With the rapidly growing RDF (Resource Description Framework) data on the Semantic Web, processing large semantic graph data have become more challenging. Constructing a summary graph structure from the raw RDF can help obtain semantic type relations and reduce the computational complexity for graph processing purposes. In this paper, we addressed the problem of graph summarization in RDF graphs, and we proposed an approach for building summary graph structures automatically from RDF graph data. Moreover, we introduced a measure to help discover optimum class dissimilarity thresholds and an effective method to discover the type classes automatically. In future work, we plan to investigate further improvement options on the scalability of the proposed method

arXiv.org e-Print Archive

Biblioteca Digital de la Comunidad de Madrid