13 research outputs found
Shape Expressions Schemas
We present Shape Expressions (ShEx), an expressive schema language for RDF
designed to provide a high-level, user friendly syntax with intuitive
semantics. ShEx allows to describe the vocabulary and the structure of an RDF
graph, and to constrain the allowed values for the properties of a node. It
includes an algebraic grouping operator, a choice operator, cardinalitiy
constraints for the number of allowed occurrences of a property, and negation.
We define the semantics of the language and illustrate it with examples. We
then present a validation algorithm that, given a node in an RDF graph and a
constraint defined by the ShEx schema, allows to check whether the node
satisfies that constraint. The algorithm outputs a proof that contains
trivially verifiable associations of nodes and the constraints that they
satisfy. The structure can be used for complex post-processing tasks, such as
transforming the RDF graph to other graph or tree structures, verifying more
complex constraints, or debugging (w.r.t. the schema). We also show the
inherent difficulty of error identification of ShEx
Relational to RDF Data Exchange in Presence of a Shape Expression Schema
International audienceWe study the relational to RDF data exchange problem, where the target constraints are specified using Shape Expression schema (ShEx). We investigate two fundamental problems: 1) consistency which is checking for a given data exchange setting whether there always exists a solution for any source instance, and 2) constructing a universal solution which is a solution that represents the space of all solutions. We propose to use typed IRI constructors in source-to-target tuple generating dependencies to create the IRIs of the RDF graph from the values in the relational instance, and we translate ShEx into a set of target dependencies. We also identify data exchange settings that are key covered, a property that is decidable and guarantees consistency. Furthermore, we show that this property is a sufficient and necessary condition for the existence of universal solutions for a practical subclass of weakly-recursive ShEx
Complexity and Expressiveness of ShEx for RDF
International audienceWe study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi-and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable
: Méthodes d'Inférence Symbolique pour les Bases de Données
This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mémoire est une courte présentation d’une direction de recherche, à laquelle j’ai activementparticipé, sur l’apprentissage pour les bases de données à partir d’exemples. Cette recherches’est concentrée sur les modèles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schéma d’une base de données. Concernantles schémas, nos contributions consistent en plusieurs langages de schémas pour les nouveaumodèles de bases de données que sont XML non-ordonné et RDF. Nous avons ainsi étudiél’apprentissage à partir d’exemples des schémas pour XML non-ordonné, des schémas pour RDF,des requêtes twig pour XML, les requêtes de jointure pour bases de données relationnelles et lestransformations XML définies par un nouveau modèle de transducteurs arbre-à -mot.Pour explorer si les langages proposés peuvent être appris, nous avons été obligés d’examinerde près un certain nombre de leurs propriétés fondamentales, souvent souvent intéressantespar elles-mêmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohérence d’un ensemble d’exemples et la caractérisation finie. Une bonne compréhension de cespropriétés nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement très vaste grâce à un ensemble d’opérations de généralisation adapté à la recherche d’une solution appropriée.L’apprentissage (ou l’inférence) est un problème à deux paramètres : la classe précise delangage que nous souhaitons inférer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placés dans le cas où l’utilisateur fournit des exemples positifs, c’est-à -dire deséléments qui appartiennent au langage cible, ainsi que des exemples négatifs, c’est-à -dire qui n’enfont pas partie. En général l’utilisation à la fois d’exemples positifs et négatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples négatifs est souvent difficile parce que les exemples positifs et négatifspeuvent rendre la forme de l’espace de recherche très complexe, et par conséquent, son explorationinfaisable
Containment of Shape Expression Schemas for RDF
We study the problem of containment for shape expression schemas (ShEx) for
RDF graphs. We identify a subclass of ShEx that has a natural graphical
representation in the form of shape graphs and their semantics is captured with
a tractable notion of embedding of an RDF graph in a shape graph. When applied
to pairs of shape graphs, an embedding is a sufficient condition for
containment, and for a practical subclass of deterministic shape graphs, it is
also a necessary one, thus yielding a subclass with tractable containment.
While for general shape graphs a minimal counter-example i.e., an instance
proving non-containment, might be of exponential size, we show that containment
is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx
is coNEXP-hard and in coTwoNEXP^NP
Inference of Shape Graphs for Graph Databases
We investigate the problem of constructing a shape graph that describes the structure of a given graph database. We employ the framework of grammatical inference, where the objective is to find an inference algorithm that is both sound, i.e., always producing a schema that validates the input graph, and complete, i.e., able to produce any schema, within a given class of schemas, provided that a sufficiently informative input graph is presented. We identify a number of fundamental limitations that preclude feasible inference. We present inference algorithms based on natural approaches that allow to infer schemas that we argue to be of practical importance
Comparative expressiveness of ShEx and SHACL (Early working draft)
Contributions • We propose a simple formal language for graph shapes that subsumes both ShEx and SHACL. The semantics of the language is based on the semantics of Datalog, and also equivalently defined in terms of Monadic Second Order Logic with Presburger constraints. • We propose a formal semantics of SHACL as a translation to this language. Thanks to this translation, we show that SHACL can be extended with well-defined stratified recursion. • We show how ShEx can be translated to this language. • We explore the necessary restrictions on ShEx so that it can be translated to SHACL, and also the possible modifications of SHACL so that it can capture a bigger fragment of ShEx
Semi Automatic Construction of ShEx and SHACL Schemas
We present a method for the construction of SHACL or ShEx constraints for an existing RDF dataset. It has two components that are used conjointly: an algorithm for automatic schema construction, and an interactive workflow for editing the schema. The schema construction algorithm takes as input sets of sample nodes and constructs a shape constraint for every sample set. It can be parametrized by a schema pattern that defines structural requirements for the schema to be constructed. Schema patterns are also used to feed the algorithm with relevant information about the dataset coming from a domain expert or from some ontology. The interactive workflow provides useful information about the dataset, shows validation results w.r.t. the schema under construction, and offers schema editing operations that combined with the schema construction algorithm allow to build a complex ShEx or SHACL schema