88 research outputs found
Luzzu - A Framework for Linked Data Quality Assessment
With the increasing adoption and growth of the Linked Open Data cloud [9],
with RDFa, Microformats and other ways of embedding data into ordinary Web
pages, and with initiatives such as schema.org, the Web is currently being
complemented with a Web of Data. Thus, the Web of Data shares many
characteristics with the original Web of Documents, which also varies in
quality. This heterogeneity makes it challenging to determine the quality of
the data published on the Web and to subsequently make this information
explicit to data consumers. The main contribution of this article is LUZZU, a
quality assessment framework for Linked Open Data. Apart from providing quality
metadata and quality problem reports that can be used for data cleaning, LUZZU
is extensible: third party metrics can be easily plugged-in the framework. The
framework does not rely on SPARQL endpoints, and is thus free of all the
problems that come with them, such as query timeouts. Another advantage over
SPARQL based qual- ity assessment frameworks is that metrics implemented in
LUZZU can have more complex functionality than triple matching. Using the
framework, we performed a quality assessment of a number of statistical linked
datasets that are available on the LOD cloud. For this evaluation, 25 metrics
from ten different dimensions were implemented
Quality Assessment of Linked Datasets using Probabilistic Approximation
With the increasing application of Linked Open Data, assessing the quality of
datasets by computing quality metrics becomes an issue of crucial importance.
For large and evolving datasets, an exact, deterministic computation of the
quality metrics is too time consuming or expensive. We employ probabilistic
techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient
estimation for implementing a broad set of data quality metrics in an
approximate but sufficiently accurate way. Our implementation is integrated in
the comprehensive data quality assessment framework Luzzu. We evaluated its
performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding
Assessing the quality of geospatial linked data – experiences from Ordnance Survey Ireland (OSi)
Ordnance Survey Ireland (OSi) is Ireland’s national mapping agency
that is responsible for the digitisation of the island’s infrastructure in terms of
mapping. Generating data from various sensors (e.g. spatial sensors), OSi build
its knowledge in the Prime2 framework, a subset of which is transformed into
geo-Linked Data. In this paper we discuss how the quality of the generated
sematic data fares against datasets in the LOD cloud. We set up Luzzu, a scalable
Linked Data quality assessment framework, in the OSi pipeline to continuously
assess produced data in order to tackle any quality problems prior to publishing
An intelligent linked data quality dashboard
This paper describes a new intelligent, data-driven dashboard for linked data quality assessment. The development goal was to assist data quality engineers to interpret data quality problems found when evaluating a dataset us-ing a metrics-based data quality assessment. This required construction of a graph linking the problematic things identified in the data, the assessment metrics and the source data. This context and supporting user interfaces help the user to un-derstand data quality problems. An analysis widget also helped the user identify the root cause multiple problems. This supported the user in identification and prioritization of the problems that need to be fixed and to improve data quality. The dashboard was shown to be useful for users to clean data. A user evaluation was performed with both expert and novice data quality engineers
Quality metrics to measure the standards conformance of geospatial linked data
This paper describes three new Geospatial Linked Data
(GLD) quality metrics that help evaluate conformance to standards.
Standards conformance is a key quality criteria, for example for FAIR
data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets
that showed a wide variation in standards conformance. This is the first
set of Linked Data quality metrics developed specifically for GL
LinkedDataOps: linked data operations based on quality process cycle
This paper describes three new Geospatial Linked Data
(GLD) quality metrics that help evaluate conformance to standards.
Standards conformance is a key quality criteria, for example for FAIR
data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets
that showed a wide variation in standards conformance. This is the first
set of Linked Data quality metrics developed specifically for GLD
Representing Dataset Quality Metadata using Multi-Dimensional Views
Data quality is commonly defined as fitness for use. The problem of
identifying quality of data is faced by many data consumers. Data publishers
often do not have the means to identify quality problems in their data. To make
the task for both stakeholders easier, we have developed the Dataset Quality
Ontology (daQ). daQ is a core vocabulary for representing the results of
quality benchmarking of a linked dataset. It represents quality metadata as
multi-dimensional and statistical observations using the Data Cube vocabulary.
Quality metadata are organised as a self-contained graph, which can, e.g., be
embedded into linked open datasets. We discuss the design considerations, give
examples for extending daQ by custom quality metrics, and present use cases
such as analysing data versions, browsing datasets by quality, and link
identification. We finally discuss how data cube visualisation tools enable
data publishers and consumers to analyse better the quality of their data.Comment: Preprint of a paper submitted to the forthcoming SEMANTiCS 2014, 4-5
September 2014, Leipzig, German
- …