9 research outputs found
Structural Summarization of Semantic Graphs Using Quotients
Graph summarization is the process of computing a compact version of an input graph while preserving chosen features of its structure. We consider semantic graphs where the features include edge labels and label sets associated with a vertex. Graph summaries are typically much smaller than the original graph. Applications that depend on the preserved features can perform their tasks on the summary, but much faster or with less memory overhead, while producing the same outcome as if they were applied on the original graph. In this survey, we focus on structural summaries based on quotients that organize vertices in equivalence classes of shared features. Structural summaries are particularly popular for semantic graphs and have the advantage of defining a precise graph-based output. We consider approaches and algorithms for both static and temporal graphs. A common example of quotient-based structural summaries is bisimulation, and we discuss this in detail. While there exist other surveys on graph summarization, to the best of our knowledge, we are the first to bring in a focused discussion on quotients, bisimulation, and their relation. Furthermore, structural summarization naturally connects well with formal logic due to the discrete structures considered. We complete the survey with a brief description of approaches beyond structural summaries
Assessing the quality of Wikidata referencing
Wikidata is a versatile and broad-based Knowledge Graph (KG) that leverages the
power of collaborative contributions via an open wiki, augmented by bot accounts,
to curate the content. Wikidata represents over 102 million interlinked data entities,
accompanied by over 1.4 billion statements about the items, accessible to the public
via a SPARQL endpoint and diverse dump formats. The Wikidata data model enables assigning references to every single statement. While the quality of Wikidata
statements has been assessed, the quality of references in this knowledge graph is
not well covered in the literature. To cover the gap, we develop and implement
a comprehensive referencing quality assessment framework based on Linked Data
quality dimensions and criteria. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides
quantified scores by which the referencing quality can be analyzed and compared.
Due to the scale of Wikidata, we developed a subsetting approach to creating
a comparison platform that systematically samples Wikidata. We have used both
well-defined subsets and random samples to evaluate the quality of references in
Wikidata using RQSS. Based on RQSS, the overall referencing quality in Wikidata
subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher
overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores
amongst topical subsets. Regarding referencing quality dimensions, all subsets have
high scores in accuracy, availability, security, and understandability, but have weaker
scores in completeness, verifiability, objectivity, and versatility. RQSS scripts can
be reused to monitor the referencing quality over time. The evaluation shows that
RQSS is practical and provides valuable information, which can be used by Wikidata contributors and WikiProject owners to identify the referencing quality gaps.
Although RQSS is developed based on the Wikidata RDF model, its referencing
quality assessment framework can be generalized to any RDF KG.James Watt Scholarship fundin
Exploiting general-purpose background knowledge for automated schema matching
The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process.
In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources.
A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems.
One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented.
In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications