18 research outputs found
Shape Expressions Schemas
We present Shape Expressions (ShEx), an expressive schema language for RDF
designed to provide a high-level, user friendly syntax with intuitive
semantics. ShEx allows to describe the vocabulary and the structure of an RDF
graph, and to constrain the allowed values for the properties of a node. It
includes an algebraic grouping operator, a choice operator, cardinalitiy
constraints for the number of allowed occurrences of a property, and negation.
We define the semantics of the language and illustrate it with examples. We
then present a validation algorithm that, given a node in an RDF graph and a
constraint defined by the ShEx schema, allows to check whether the node
satisfies that constraint. The algorithm outputs a proof that contains
trivially verifiable associations of nodes and the constraints that they
satisfy. The structure can be used for complex post-processing tasks, such as
transforming the RDF graph to other graph or tree structures, verifying more
complex constraints, or debugging (w.r.t. the schema). We also show the
inherent difficulty of error identification of ShEx
Complexity and Expressiveness of ShEx for RDF
International audienceWe study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi-and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable
Linked open drug data for pharmaceutical research and development
There is an abundance of information about drugs available on the Web. Data sources range from medicinal chemistry results, over the impact of drugs on gene expression, to the outcomes of drugs in clinical trials. These data are typically not connected together, which reduces the ease with which insights can be gained. Linking Open Drug Data (LODD) is a task force within the World Wide Web Consortium's (W3C) Health Care and Life Sciences Interest Group (HCLS IG). LODD has surveyed publicly available data about drugs, created Linked Data representations of the data sets, and identified interesting scientific and business questions that can be answered once the data sets are connected. The task force provides recommendations for the best practices of exposing data in a Linked Data representation. In this paper, we present past and ongoing work of LODD and discuss the growing importance of Linked Data as a foundation for pharmaceutical R&D data sharing
FHIR RDF Data Transformation and Validation Framework and Clinical Knowledge Graphs: Towards Explainable AI in Healthcare
HL7 Fast Healthcare Interoperability Resources (FHIR) is rapidly becoming the
standards framework for the exchange of electronic health record (EHR) data. By
leveraging FHIRs resource-oriented architecture, FHIR RDF stands to become
the first main-stream clinical data standard to incorporate the Semantic Web
vision. The combination of FHIR, knowledge graphs and the Semantic Web
enables a new paradigm to build classification and explainable artificial
intelligence (AI) applications in healthcare. The objective of the tutorial is to
introduce the FHIR RDF data transformation and validation framework, show
how to build clinical knowledge graphs (cKG) in FHIR RDF, and provide the
audience with hands-on opportunities on FHIR RDF and cKG tooling.
Specifically:Topics regarding the FHIR RDF data transformation and validation
framework include:
1. FHIR, and itÂŽs representations FHIR JSON and FHIR RDF;
2. Conversion of FHIR JSON to FHIR RDF (via JSON-LD), use of the FHIR RDF
playground, command line tools and HAPI-FHIRÂŽs RDF (Turtle) support;
3. The Shape Expressions (ShEx) schemafor FHIR and its use for validating
FHIR data;
4. FHIR structure definitions and their expression as JSON-LD contexts and
ShEx schemas;
In addition, the FHIR-Ontop-OMOP tool exposes the Observational Medical
Outcomes Partnership (OMOP) data as a queryable Knowledge Graph compliant
with the HL7 FHIR standard using the Ontop Virtual Knowledge Graph engine. In
this tutorial, we demonstrate how to set up Ontop over a working connection to
the OMOP PostgreSQL database using a mapping language. Thanks to the
virtual approach, the FHIR RDF triples populated by declarative mapping do not
need to be materialized. Instead, Ontop translates the SPARQL query over the
FHIR RDF model to a SQL query over the OMOP database. We will illustrate the
query translation process with some representative phenotype queries.
Acknowledgements:
This tutorial session is supported in part by the NIH FHIRCat R01 grant (R01
EB030529)
Validating RDF Data
International audienceRDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange.The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models.At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages provide models and vocabularies for expressing such structural constraints.This book describes two technologies for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two, and some example applications.Table of Contents: Preface / Foreword by Phil Archer / Foreword by Tom Baker / Foreword by Dan Brickley and Libby Miller / Acknowledgments / Introduction / The RDF Ecosystem / Data Quality / Shape Expressions / SHACL / Applications / Comparing ShEx and SHACL / Bibliography / Authors' Biographies / Inde
Shape Expressions Schemas
We present Shape Expressions (ShEx), an expressive schema language for RDF designed to provide a high-level, user friendly syntax with intuitive semantics. ShEx allows to describe the vocabulary and the structure of an RDF graph, and to constrain the allowed values for the properties of a node. It includes an algebraic grouping operator, a choice operator, cardinalitiy constraints for the number of allowed occurrences of a property, and negation. We define the semantics of the language and illustrate it with examples. We then present a validation algorithm that, given a node in an RDF graph and a constraint defined by the ShEx schema, allows to check whether the node satisfies that constraint. The algorithm outputs a proof that contains trivially verifiable associations of nodes and the constraints that they satisfy. The structure can be used for complex post-processing tasks, such as transforming the RDF graph to other graph or tree structures, verifying more complex constraints, or debugging (w.r.t. the schema). We also show the inherent difficulty of error identification of ShEx
Creating, maintaining and updating Shape Expressions as EntitySchemas in the Wikimedia ecosystem
Shape Expressions are formal â machine readable â descriptions of data shapes/schemas. They provide the means to validate expectations by both data and use-case providers. In 2019 the Wikidata introduced the EntitySchema namespace that allows storing Shape Expressions in and Wikibase extensions. Next to Wikidata, this EntitySchema namespace is also available to local wikibase installs and on cloud installation such as wbstack.com. In this tutorial we will shortly introduce Shape Expressions after which we will guide the audience through the EntitySchema namespace in both Wikidata and Wikibase. We will also introduce Wikishape (https://wikishape.weso.es/) as a Shape Expression platform provided by Weso. After this tutorial the participants will be able to write simple Shape Expressions and maintain that on either Wikidata or a local Wikibase
Frugal FAIR Data Point: A Document-Based Implementation for Enhanced Interoperability and Accessibility
This paper presents a novel approach to publishing metadata in a FAIR manner, according to the FAIR Data Point specifications. This approach focuses on a document-based methodology that aligns with stringent infrastructure policies and caters to institutions with limited IT resources. While adhering to the standard FDP specifications, this implementation offers a lean solution that utilizes the filesystem with a lean web server for metadata publication, ensuring broad accessibility and interoperability