12 research outputs found
Complexity and Expressiveness of ShEx for RDF
International audienceWe study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi-and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable
Semantics and Validation of Shapes Schemas for RDF
We present a formal semantics and proof of soundness for shapes schemas, an
expressive schema language for RDF graphs that is the foundation of Shape
Expressions Language 2.0. It can be used to describe the vocabulary and the
structure of an RDF graph, and to constrain the admissible properties and
values for nodes in that graph. The language defines a typing mechanism called
shapes against which nodes of the graph can be checked. It includes an
algebraic grouping operator, a choice operator and cardinality constraints for
the number of allowed occurrences of a property. Shapes can be combined using
Boolean operators, and can use possibly recursive references to other shapes.
We describe the syntax of the language and define its semantics. The
semantics is proven to be well-defined for schemas that satisfy a reasonable
syntactic restriction, namely stratified use of negation and recursion. We
present two algorithms for the validation of an RDF graph against a shapes
schema. The first algorithm is a direct implementation of the semantics,
whereas the second is a non-trivial improvement. We also briefly give
implementation guidelines
Shape Expressions Schemas
We present Shape Expressions (ShEx), an expressive schema language for RDF
designed to provide a high-level, user friendly syntax with intuitive
semantics. ShEx allows to describe the vocabulary and the structure of an RDF
graph, and to constrain the allowed values for the properties of a node. It
includes an algebraic grouping operator, a choice operator, cardinalitiy
constraints for the number of allowed occurrences of a property, and negation.
We define the semantics of the language and illustrate it with examples. We
then present a validation algorithm that, given a node in an RDF graph and a
constraint defined by the ShEx schema, allows to check whether the node
satisfies that constraint. The algorithm outputs a proof that contains
trivially verifiable associations of nodes and the constraints that they
satisfy. The structure can be used for complex post-processing tasks, such as
transforming the RDF graph to other graph or tree structures, verifying more
complex constraints, or debugging (w.r.t. the schema). We also show the
inherent difficulty of error identification of ShEx
Using shape expressions (ShEx) to share rdf data models and to guide curation with rigorous validation
International Conference, European Semantic Web Conference, ESWC (16th. 2019. Portorož, Slovenia
Relational to RDF Data Exchange in Presence of a Shape Expression Schema
International audienceWe study the relational to RDF data exchange problem, where the target constraints are specified using Shape Expression schema (ShEx). We investigate two fundamental problems: 1) consistency which is checking for a given data exchange setting whether there always exists a solution for any source instance, and 2) constructing a universal solution which is a solution that represents the space of all solutions. We propose to use typed IRI constructors in source-to-target tuple generating dependencies to create the IRIs of the RDF graph from the values in the relational instance, and we translate ShEx into a set of target dependencies. We also identify data exchange settings that are key covered, a property that is decidable and guarantees consistency. Furthermore, we show that this property is a sufficient and necessary condition for the existence of universal solutions for a practical subclass of weakly-recursive ShEx
Containment of Shape Expression Schemas for RDF
We study the problem of containment for shape expression schemas (ShEx) for
RDF graphs. We identify a subclass of ShEx that has a natural graphical
representation in the form of shape graphs and their semantics is captured with
a tractable notion of embedding of an RDF graph in a shape graph. When applied
to pairs of shape graphs, an embedding is a sufficient condition for
containment, and for a practical subclass of deterministic shape graphs, it is
also a necessary one, thus yielding a subclass with tractable containment.
While for general shape graphs a minimal counter-example i.e., an instance
proving non-containment, might be of exponential size, we show that containment
is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx
is coNEXP-hard and in coTwoNEXP^NP
Comparative expressiveness of ShEx and SHACL (Early working draft)
Contributions • We propose a simple formal language for graph shapes that subsumes both ShEx and SHACL. The semantics of the language is based on the semantics of Datalog, and also equivalently defined in terms of Monadic Second Order Logic with Presburger constraints. • We propose a formal semantics of SHACL as a translation to this language. Thanks to this translation, we show that SHACL can be extended with well-defined stratified recursion. • We show how ShEx can be translated to this language. • We explore the necessary restrictions on ShEx so that it can be translated to SHACL, and also the possible modifications of SHACL so that it can capture a bigger fragment of ShEx
Recommended from our members
Semantic units: organizing knowledge graphs into semantically meaningful units of representation
Background
In today’s landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles—ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs.
Results
We introduce “semantic units” as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource.
Conclusions
Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph
PG-Schema: Schemas for Property Graphs
Property graphs have reached a high level of maturity, witnessed by multiple
robust graph database systems as well as the ongoing ISO standardization effort
aiming at creating a new standard Graph Query Language (GQL). Yet, despite
documented demand, schema support is limited both in existing systems and in
the first version of the GQL Standard. It is anticipated that the second
version of the GQL Standard will include a rich DDL. Aiming to inspire the
development of GQL and enhance the capabilities of graph database systems, we
propose PG-Schema, a simple yet powerful formalism for specifying property
graph schemas. It features PG-Types with flexible type definitions supporting
multi-inheritance, as well as expressive constraints based on the recently
proposed PG-Keys formalism. We provide the formal syntax and semantics of
PG-Schema, which meet principled design requirements grounded in contemporary
property graph management scenarios, and offer a detailed comparison of its
features with those of existing schema languages and graph database systems.Comment: 25 page