9 research outputs found

    Structural Summarization of Semantic Graphs Using Quotients

    Get PDF
    Graph summarization is the process of computing a compact version of an input graph while preserving chosen features of its structure. We consider semantic graphs where the features include edge labels and label sets associated with a vertex. Graph summaries are typically much smaller than the original graph. Applications that depend on the preserved features can perform their tasks on the summary, but much faster or with less memory overhead, while producing the same outcome as if they were applied on the original graph. In this survey, we focus on structural summaries based on quotients that organize vertices in equivalence classes of shared features. Structural summaries are particularly popular for semantic graphs and have the advantage of defining a precise graph-based output. We consider approaches and algorithms for both static and temporal graphs. A common example of quotient-based structural summaries is bisimulation, and we discuss this in detail. While there exist other surveys on graph summarization, to the best of our knowledge, we are the first to bring in a focused discussion on quotients, bisimulation, and their relation. Furthermore, structural summarization naturally connects well with formal logic due to the discrete structures considered. We complete the survey with a brief description of approaches beyond structural summaries

    Emergent relational schemas for RDF

    Get PDF

    Distributed pattern mining and data publication in life sciences using big data technologies

    Get PDF

    Assessing the quality of Wikidata referencing

    Get PDF
    Wikidata is a versatile and broad-based Knowledge Graph (KG) that leverages the power of collaborative contributions via an open wiki, augmented by bot accounts, to curate the content. Wikidata represents over 102 million interlinked data entities, accompanied by over 1.4 billion statements about the items, accessible to the public via a SPARQL endpoint and diverse dump formats. The Wikidata data model enables assigning references to every single statement. While the quality of Wikidata statements has been assessed, the quality of references in this knowledge graph is not well covered in the literature. To cover the gap, we develop and implement a comprehensive referencing quality assessment framework based on Linked Data quality dimensions and criteria. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides quantified scores by which the referencing quality can be analyzed and compared. Due to the scale of Wikidata, we developed a subsetting approach to creating a comparison platform that systematically samples Wikidata. We have used both well-defined subsets and random samples to evaluate the quality of references in Wikidata using RQSS. Based on RQSS, the overall referencing quality in Wikidata subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores amongst topical subsets. Regarding referencing quality dimensions, all subsets have high scores in accuracy, availability, security, and understandability, but have weaker scores in completeness, verifiability, objectivity, and versatility. RQSS scripts can be reused to monitor the referencing quality over time. The evaluation shows that RQSS is practical and provides valuable information, which can be used by Wikidata contributors and WikiProject owners to identify the referencing quality gaps. Although RQSS is developed based on the Wikidata RDF model, its referencing quality assessment framework can be generalized to any RDF KG.James Watt Scholarship fundin

    Exploiting general-purpose background knowledge for automated schema matching

    Full text link
    The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

    On the HDT with the Tree Representation for Large RDFs on GPU

    No full text
    corecore