Data quality is commonly defined as fitness for use. The problem of
identifying quality of data is faced by many data consumers. Data publishers
often do not have the means to identify quality problems in their data. To make
the task for both stakeholders easier, we have developed the Dataset Quality
Ontology (daQ). daQ is a core vocabulary for representing the results of
quality benchmarking of a linked dataset. It represents quality metadata as
multi-dimensional and statistical observations using the Data Cube vocabulary.
Quality metadata are organised as a self-contained graph, which can, e.g., be
embedded into linked open datasets. We discuss the design considerations, give
examples for extending daQ by custom quality metrics, and present use cases
such as analysing data versions, browsing datasets by quality, and link
identification. We finally discuss how data cube visualisation tools enable
data publishers and consumers to analyse better the quality of their data.Comment: Preprint of a paper submitted to the forthcoming SEMANTiCS 2014, 4-5
September 2014, Leipzig, German