    Towards supporting multiple semantics of named graphs using N3 rules

    Semantic Web applications often require the partitioning of triples into subgraphs, and then associating them with useful metadata (e.g., provenance). This led to the introduction of RDF datasets, with each RDF dataset comprising a default graph and zero or more named graphs. However, due to differences in RDF implementations, no consensus could be reached on a standard semantics; and a range of different dataset semantics are currently assumed. For an RDF system not be limited to only a subset of online RDF datasets, the system would need to be extended to support different dataset semantics—exactly the problem that eluded consensus before. In this paper, we transpose this problem to Notation3 Logic, an RDF-based rule language that similarly allows citing graphs within RDF documents. We propose a solution where an N3 author can directly indicate the intended semantics of a cited graph— possibly, combining multiple semantics within a single document. We supply an initial set of companion N3 rules, which implement a number of RDF dataset semantics, which allow an N3-compliant system to easily support multiple different semantics

    A Category Theoretic Model of RDF Ontology

    Benchmarking graph database backends : what works well with Wikidata?

    Knowledge bases often utilize graphs as logical model. RDF-based knowledge bases (KB) are prime examples, as RDF (Resource Description Framework) uses graph as logical model. Graph databases are an emerging breed of NoSQL-type databases, offering graph operations to process and manipulate data. Although there are specialized databases, the so-called triple stores, for storing RDF data, graph databases can also be promising candidates for storing knowledge. In this paper, we benchmark different graph database implementations loaded with Wikidata, a real-life, large-scale knowledge base. Graph databases come in all shapes and sizes, offer different APIs and graph models. Hence we used a measurement system, that can abstract away the API differences. For the modeling aspect, we made measurements with different graph encodings previously suggested in the literature, in order to observe the impact of the encoding aspect on the overall performance

    Provenance-aware knowledge representation: A survey of data models and contextualized knowledge graphs

    Expressing machine-interpretable statements in the form of subject-predicate-object triples is a well-established practice for capturing semantics of structured data. However, the standard used for representing these triples, RDF, inherently lacks the mechanism to attach provenance data, which would be crucial to make automatically generated and/or processed data authoritative. This paper is a critical review of data models, annotation frameworks, knowledge organization systems, serialization syntaxes, and algebras that enable provenance-aware RDF statements. The various approaches are assessed in terms of standard compliance, formal semantics, tuple type, vocabulary term usage, blank nodes, provenance granularity, and scalability. This can be used to advance existing solutions and help implementers to select the most suitable approach (or a combination of approaches) for their applications. Moreover, the analysis of the mechanisms and their limitations highlighted in this paper can serve as the basis for novel approaches in RDF-powered applications with increasing provenance needs

    Implicit quantification made explicit : how to interpret blank nodes and universal variables in Notation3 Logic

    Since the invention of Notation3 Logic, several years have passed in which the theory has been refined and applied in different reasoning engines like Cwm, EYE, and FuXi. But despite these developments, a clear formal definition of Notation3’s semantics is still missing. This does not only form an obstacle for the formal investigation of that logic and its relations to other formalisms, it has also practical consequences: in many cases the interpretations of the same formula differ between reasoning engines. In this paper we tackle one of the main sources of that problem, namely the uncertainty about implicit quantification. This refers to Notation3’s ability to use bound variables for which the universal or existential quantifiers are not explicitly stated, but implicitly assumed. We provide a tool for clarification through the definition of a core logic for Notation3 that only supports explicit quantification. We specify an attribute grammar which maps Notation3 formulas to that logic according to the different interpretations and thereby define the semantics of Notation3. This grammar is then implemented and used to test the impact of the differences between interpretations on practical cases. Our dataset includes Notation3 implementations from former research projects and test cases developed for the reasoner EYE. We find that 31% of these files are understood differently by different reasoners. We further analyse these cases and categorize them in different classes of which we consider one most harmful: if a file is manually written by a user and no specific built-in predicates are used (13% of our critical files), it is unlikely that this user is aware of possible differences. We therefore argue the need to come to an agreement on implicit quantification, and discuss the different possibilities

    Scalable Data Integration for Linked Data

    Linked Data describes an extensive set of structured but heterogeneous datasources where entities are connected by formal semantic descriptions. In thevision of the Semantic Web, these semantic links are extended towards theWorld Wide Web to provide as much machine-readable data as possible forsearch queries. The resulting connections allow an automatic evaluation to findnew insights into the data. Identifying these semantic connections betweentwo data sources with automatic approaches is called link discovery. We derivecommon requirements and a generic link discovery workflow based on similaritiesbetween entity properties and associated properties of ontology concepts. Mostof the existing link discovery approaches disregard the fact that in times ofBig Data, an increasing volume of data sources poses new demands on linkdiscovery. In particular, the problem of complex and time-consuming linkdetermination escalates with an increasing number of intersecting data sources.To overcome the restriction of pairwise linking of entities, holistic clusteringapproaches are needed to link equivalent entities of multiple data sources toconstruct integrated knowledge bases. In this context, the focus on efficiencyand scalability is essential. For example, reusing existing links or backgroundinformation can help to avoid redundant calculations. However, when dealingwith multiple data sources, additional data quality problems must also be dealtwith. This dissertation addresses these comprehensive challenges by designingholistic linking and clustering approaches that enable reuse of existing links.Unlike previous systems, we execute the complete data integration workflowvia a distributed processing system. At first, the LinkLion portal will beintroduced to provide existing links for new applications. These links act asa basis for a physical data integration process to create a unified representationfor equivalent entities from many data sources. We then propose a holisticclustering approach to form consolidated clusters for same real-world entitiesfrom many different sources. At the same time, we exploit the semantic typeof entities to improve the quality of the result. The process identifies errorsin existing links and can find numerous additional links. Additionally, theentity clustering has to react to the high dynamics of the data. In particular,this requires scalable approaches for continuously growing data sources withmany entities as well as additional new sources. Previous entity clusteringapproaches are mostly static, focusing on the one-time linking and clustering ofentities from few sources. Therefore, we propose and evaluate new approaches for incremental entity clustering that supports the continuous addition of newentities and data sources. To cope with the ever-increasing number of LinkedData sources, efficient and scalable methods based on distributed processingsystems are required. Thus we propose distributed holistic approaches to linkmany data sources based on a clustering of entities that represent the samereal-world object. The implementation is realized on Apache Flink. In contrastto previous approaches, we utilize efficiency-enhancing optimizations for bothdistributed static and dynamic clustering. An extensive comparative evaluationof the proposed approaches with various distributed clustering strategies showshigh effectiveness for datasets from multiple domains as well as scalability on amulti-machine Apache Flink cluster

    Valid Time RDF

    The Semantic Web aims at building a foundation of semantic-based data models and languages for not only manipulating data and knowledge, but also supporting decision making by machines. Naturally, time-varying data and knowledge are required in Semantic Web applications to incorporate time and further reason about it. However, the original specifications of Resource Description Framework (RDF) and Web Ontology Language (OWL) do not include constructs for handling time-varying data and knowledge. For simplicity, RDF model is confined to binary predicates, hence some form of reification is needed to represent higher-arity predicates. To this date, there are many proposals extending RDF and OWL for handling temporal data and knowledge. They all focus on the valid time. Some of these proposals stay within the standards whereas others add new constructs to RDF and its query language, SPARQL. We first study these models in a comparative framework and develop a taxonomy for classifying them. On this basis, we propose a new temporal data model, Valid Time RDF, or VTRDF, that incorporates valid time explicitly into RDF. We define valid time resources as the building blocks of VTRDF. Our approach treats all resources in VTRDF uniformly, which is significant in that the need of RDF reification is eliminated. In particular, using VTRDF to handle temporal data and knowledge requires no additional triples or objects. We formally define valid time triples and graphs, which are subject to the Temporal Triple Integrity, and the formal semantics for the layered sets of VTRDF vocabularies. To query VTRDF triple databases, we design a query language, VT-SPARQL, that extends the standard SPARQL to handle valid time resources, time intervals, and temporal reasoning. We have also shown that space and time complexity of VTRDF, and the time complexity of the evaluating VT-SPARQL queries

