Search CORE

557 research outputs found

Graph Summarization

Author: Bonifati Angela
Dumbrava Stefania
Kondylakis Haridimos
Publication venue
Publication date: 01/04/2020
Field of study

The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Challenges in Bridging Social Semantics and Formal Semantics on the Web

Author: A Cooper
A Edwards
E Cabrio
E Cabrio
E Cabrio
G Erétéo
I Mirbel
L Costabello
LC Freeman
O Corby
P Neveu
PM Dung
R Hasan
RN Raghavan
S Angeletou
Publication venue
Publication date: 04/07/2013
Field of study

This paper describes several results of Wimmics, a research lab which names stands for: web-instrumented man-machine interactions, communities, and semantics. The approaches introduced here rely on graph-oriented knowledge representation, reasoning and operationalization to model and support actors, actions and interactions in web-based epistemic communities. The re-search results are applied to support and foster interactions in online communities and manage their resources

arXiv.org e-Print Archive

Crossref

HAL-UNICE

INRIA a CCSD electronic archive server

HAL Descartes

RDF graph summarization: principles, techniques and applications (tutorial)

Author: Kondylakis Haridimos
Kotzinos Dimitris
Manolescu Ioana
Publication venue: HAL CCSD
Publication date: 26/03/2019
Field of study

International audienceThe explosion in the amount of the RDF on the Web has lead to the need to explore, query and understand such data sources. The task is challenging due to the complex and heterogeneous structure of RDF graphs which, unlike relational databases, do not come with a structure-dictating schema. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; the summarization goal, and the main computational tools employed for summarizing graphs, are the main factors behind this diversity. This tutorial presents a structured analysis and comparison existing works in the area of RDF summarization; it is based upon a recent survey which we co-authored with colleagues [3]. We present the concepts at the core of each approach, outline their main technical aspects and implementation. We conclude by identifying the most pertinent summarization method for different usage scenarios, and discussing areas where future effort is needed

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Polytechnique

Structural Summarization of Semantic Graphs Using Quotients

Author: Blume Till
Cochez Michael
Rau Jannik
Richerby David
Scherp Ansgar
Publication venue: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
Publication date: 19/12/2023
Field of study

Graph summarization is the process of computing a compact version of an input graph while preserving chosen features of its structure. We consider semantic graphs where the features include edge labels and label sets associated with a vertex. Graph summaries are typically much smaller than the original graph. Applications that depend on the preserved features can perform their tasks on the summary, but much faster or with less memory overhead, while producing the same outcome as if they were applied on the original graph. In this survey, we focus on structural summaries based on quotients that organize vertices in equivalence classes of shared features. Structural summaries are particularly popular for semantic graphs and have the advantage of defining a precise graph-based output. We consider approaches and algorithms for both static and temporal graphs. A common example of quotient-based structural summaries is bisimulation, and we discuss this in detail. While there exist other surveys on graph summarization, to the best of our knowledge, we are the first to bring in a focused discussion on quotients, bisimulation, and their relation. Furthermore, structural summarization naturally connects well with formal logic due to the discrete structures considered. We complete the survey with a brief description of approaches beyond structural summaries

University of Essex Research Repository

Time and Memory Efficient Parallel Algorithm for Structural Graph Summaries and two Extensions to Incremental Summarization and $k$ -Bisimulation for Long $k$ -Chaining

Author: Blume Till
Rau Jannik
Richerby David
Scherp Ansgar
Publication venue
Publication date: 04/11/2022
Field of study

We developed a flexible parallel algorithm for graph summarization based on vertex-centric programming and parameterized message passing. The base algorithm supports infinitely many structural graph summary models defined in a formal language. An extension of the parallel base algorithm allows incremental graph summarization. In this paper, we prove that the incremental algorithm is correct and show that updates are performed in time

\mathcal{O}(\Delta \cdot d^k)

, where

\Delta

is the number of additions, deletions, and modifications to the input graph,

d

the maximum degree, and

k

is the maximum distance in the subgraphs considered. Although the iterative algorithm supports values of

k>1

, it requires nested data structures for the message passing that are memory-inefficient. Thus, we extended the base summarization algorithm by a hash-based messaging mechanism to support a scalable iterative computation of graph summarizations based on

k

-bisimulation for arbitrary

k

. We empirically evaluate the performance of our algorithms using benchmark and real-world datasets. The incremental algorithm almost always outperforms the batch computation. We observe in our experiments that the incremental algorithm is faster even in cases when

50\%

of the graph database changes from one version to the next. The incremental computation requires a three-layered hash index, which has a low memory overhead of only

8\%

(

\pm 1\%

). Finally, the incremental summarization algorithm outperforms the batch algorithm even with fewer cores. The iterative parallel

k

-bisimulation algorithm computes summaries on graphs with over

10

M edges within seconds. We show that the algorithm processes graphs of

100+\,

M edges within a few minutes while having a moderate memory consumption of

<150

GB. For the largest BSBM1B dataset with 1 billion edges, it computes

k=10

bisimulation in under an hour

arXiv.org e-Print Archive

Supporting Explainable AI on Semantic Constraint Validation

Author: Gercke Julian Alexander
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität
Publication date: 14/06/2022
Field of study

There is a rising number of knowledge graphs available published through various sources. The enormous amount of linked data strives to give entities a semantic context. Using SHACL, the entities can be validated with respect to their context. On the other hand, an increasing usage of AI models in productive systems comes with a great responsibility in various areas. Predictive models like linear, logistic regression, and tree-based models, are still frequently used as they come with a simple structure, which allows for interpretability. However, explaining models includes verifying whether the model makes predictions based on human constraints or scientific facts. This work proposes to use the semantic context of the entities in knowledge graphs to validate predictive models with respect to user-defined constraints; therefore, providing a theoretical framework for a model-agnostic validation engine based on SHACL. In a second step, the model validation results are summarized in the case of a decision tree and visualized model-coherently. Finally, the performance of the framework is evaluated based on a Python implementation

Institutionelles Repositorium der Leibniz Universität Hannover

Instance-Based Lossless Summarization of Knowledge Graph With Optimized Triples and Corrections (IBA-OTC)

Author: Algarni Asaad
Cheema Muhammad Faisal
Javed Hafiz Tayyeb
Khan Kifayat Ullah
Park Jeongmin
Publication venue: IEEE
Publication date: 07/12/2023
Field of study

Knowledge graph (KG) summarization facilitates efficient information retrieval for exploring complex structural data. For fast information retrieval, it requires processing on redundant data. However, it necessitates the completion of information in a summary graph. It also saves computational time during data retrieval, storage space, in-memory visualization, and preserving structure after summarization. State-of-the-art approaches summarize a given KG by preserving its structure at the cost of information loss. Additionally, the approaches not preserving the underlying structure, compromise the summarization ratio by focusing only on the compression of specific regions. In this way, these approaches either miss preserving the original facts or the wrong prediction of inferred information. To solve these problems, we present a novel framework for generating a lossless summary by preserving the structure through super signatures and their corresponding corrections. The proposed approach summarizes only the naturally overlapped instances while maintaining its information and preserving the underlying Resource Description Framework RDF graph. The resultant summary is composed of triples with positive, negative, and star corrections that are optimized by the smart calling of two novel functions namely merge and disperse . To evaluate the effectiveness of our proposed approach, we perform experiments on nine publicly available real-world knowledge graphs and obtain a better summarization ratio than state-of-the-art approaches by a margin of 10% to 30% with achieving its completeness, correctness, and compactness. In this way, the retrieval of common events and groups by queries is accelerated in the resultant graph

Directory of Open Access Journals

BCU Open Access

Recommended from our members

Geographic Knowledge Graph Summarization

Author: Yan Bo
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Geographic knowledge graphs play a significant role in the geospatial semantics paradigm for fulfilling the interoperability, the accessibility, and the conceptualization demands in geographic information science. However, due to the immense quantity of information accompanying and the enormous diversity of geographic knowledge graphs, there are many challenges that hinder the applicability and mass adoption of such useful structured knowledge. In order to tackle these challenges, this dissertation focuses on devising ways in which geographic knowledge graphs can be digested and summarized. Such a summarization task, on the one hand lifts the burden of information overload for end users, on the other hand facilitates the reduction of data storage, speeds up queries, and helps eliminate noise. The main contribution of this dissertation is that it introduces the general concept of geospatial inductive bias and explains different ways this idea can be used in the geographic knowledge graph summarization task. By decomposing the task into separate but related components, this dissertation is based upon three peer-reviewed articles which focus on the hierarchical place type structure, multimedia leaf nodes, and general relation and entity components respectively. A spatial knowledge map interface that illustrates the effectiveness of summarizing geographic knowledge graphs is presented. Throughout the dissertation, top-down knowledge engineering and bottom-up knowledge learning methods are integrated. We hope this dissertation would promote the awareness of this fascinating area and motivate researchers to investigate related questions

eScholarship - University of California

Named Entity Resolution in Personal Knowledge Graphs

Author: Kejriwal Mayank
Publication venue
Publication date: 22/07/2023
Field of study

Entity Resolution (ER) is the problem of determining when two entities refer to the same underlying entity. The problem has been studied for over 50 years, and most recently, has taken on new importance in an era of large, heterogeneous 'knowledge graphs' published on the Web and used widely in domains as wide ranging as social media, e-commerce and search. This chapter will discuss the specific problem of named ER in the context of personal knowledge graphs (PKGs). We begin with a formal definition of the problem, and the components necessary for doing high-quality and efficient ER. We also discuss some challenges that are expected to arise for Web-scale data. Next, we provide a brief literature review, with a special focus on how existing techniques can potentially apply to PKGs. We conclude the chapter by covering some applications, as well as promising directions for future research.Comment: To appear as a book chapter by the same name in an upcoming (Oct. 2023) book `Personal Knowledge Graphs (PKGs): Methodology, tools and applications' edited by Tiwari et a

arXiv.org e-Print Archive