74 research outputs found
An intelligent linked data quality dashboard
This paper describes a new intelligent, data-driven dashboard for linked data quality assessment. The development goal was to assist data quality engineers to interpret data quality problems found when evaluating a dataset us-ing a metrics-based data quality assessment. This required construction of a graph linking the problematic things identified in the data, the assessment metrics and the source data. This context and supporting user interfaces help the user to un-derstand data quality problems. An analysis widget also helped the user identify the root cause multiple problems. This supported the user in identification and prioritization of the problems that need to be fixed and to improve data quality. The dashboard was shown to be useful for users to clean data. A user evaluation was performed with both expert and novice data quality engineers
Scalable Quality Assessment of Linked Data
In a world where the information economy is booming, poor data quality can lead to adverse consequences, including social and economical problems such as decrease in revenue. Furthermore, data-driven indus- tries are not just relying on their own (proprietary) data silos, but are also continuously aggregating data from different sources. This aggregation could then be re-distributed back to âdata lakesâ. However, this data (including Linked Data) is not necessarily checked for its quality prior to its use. Large volumes of data are being exchanged in a standard and interoperable format between organisations and published as Linked Data to facilitate their re-use. Some organisations, such as government institutions, take a step further and open their data. The Linked Open Data Cloud is a witness to this. However, similar to data in data lakes, it is challenging to determine the quality of this heterogeneous data, and subsequently to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data quality, the current solutions do not aggregate a holistic approach that enables both the assessment of datasets and also provides consumers with quality results that can then be used to find, compare and rank datasetsâ fitness for use. In this thesis we investigate methods to assess the quality of (possibly large) linked datasets with the intent that data consumers can then use the assessment results to find datasets that are fit for use, that is; finding the right dataset for the task at hand. Moreover, the benefits of quality assessment are two-fold: (1) data consumers do not need to blindly rely on subjective measures to choose a dataset, but base their choice on multiple factors such as the intrinsic structure of the dataset, therefore fostering trust and reputation between the publishers and consumers on more objective foundations; and (2) data publishers can be encouraged to improve their datasets so that they can be re-used more. Furthermore, our approach scales for large datasets. In this regard, we also look into improving the efficiency of quality metrics using various approximation techniques. However the trade-off is that consumers will not get the exact quality value, but a very close estimate which anyway provides the required guidance towards fitness for use. The central point of this thesis is not on data quality improvement, nonetheless, we still need to understand what data quality means to the consumers who are searching for potential datasets. This thesis looks into the challenges faced to detect quality problems in linked datasets presenting quality results in a standardised machine-readable and interoperable format for which agents can make sense out of to help human consumers identifying the fitness for use dataset. Our proposed approach is more consumer-centric where it looks into (1) making the assessment of quality as easy as possible, that is, allowing stakeholders, possibly non-experts, to identify and easily define quality metrics and to initiate the assessment; and (2) making results (quality metadata and quality reports) easy for stakeholders to understand, or at least interoperable with other systems to facilitate a possible data quality pipeline. Finally, our framework is used to assess the quality of a number of heterogeneous (large) linked datasets, where each assessment returns a quality metadata graph that can be consumed by agents as Linked Data. In turn, these agents can intelligently interpret a datasetâs quality with regard to multiple dimensions and observations, and thus provide further insight to consumers regarding its fitness for use
Assessing the quality of geospatial linked data â experiences from Ordnance Survey Ireland (OSi)
Ordnance Survey Ireland (OSi) is Irelandâs national mapping agency
that is responsible for the digitisation of the islandâs infrastructure in terms of
mapping. Generating data from various sensors (e.g. spatial sensors), OSi build
its knowledge in the Prime2 framework, a subset of which is transformed into
geo-Linked Data. In this paper we discuss how the quality of the generated
sematic data fares against datasets in the LOD cloud. We set up Luzzu, a scalable
Linked Data quality assessment framework, in the OSi pipeline to continuously
assess produced data in order to tackle any quality problems prior to publishing
(Linked) Data Quality Assessment: An Ontological Approach
The effective functioning of data-intensive applications usually requires that the dataset should be of high quality. The quality depends on the task they will be used for. However, it is possible to identify task-independent data quality dimensions which are solely related to data themselves and can be extracted with the help of rule mining/pattern mining. In order to assess and improve data quality, we propose an ontological approach to report data quality violated triples. Our goal is to provide data stakeholders with a set of methods and techniques to guide them in assessing and improving data qualit
LinkedDataOps: linked data operations based on quality process cycle
This paper describes three new Geospatial Linked Data
(GLD) quality metrics that help evaluate conformance to standards.
Standards conformance is a key quality criteria, for example for FAIR
data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets
that showed a wide variation in standards conformance. This is the first
set of Linked Data quality metrics developed specifically for GLD
Exploiting Context-Dependent Quality Metadata for Linked Data Source Selection
The traditional Web is evolving into the Web of Data which consists of huge collections
of structured data over poorly controlled distributed data sources. Live
queries are needed to get current information out of this global data space. In live
query processing, source selection deserves attention since it allows us to identify the
sources which might likely contain the relevant data. The thesis proposes a source
selection technique in the context of live query processing on Linked Open Data,
which takes into account the context of the request and the quality of data contained in
the sources to enhance the relevance (since the context enables a better interpretation
of the request) and the quality of the answers (which will be obtained by processing
the request on the selected sources). Specifically, the thesis proposes an extension of
the QTree indexing structure that had been proposed as a data summary to support
source selection based on source content, to take into account quality and contextual
information. With reference to a specific case study, the thesis also contributes an approach,
relying on the Luzzu framework, to assess the quality of a source with respect
to for a given context (according to different quality dimensions). An experimental
evaluation of the proposed techniques is also provide
A Fuzzy Approach for Data Quality Assessment of Linked Datasets
For several applications, an integrated view of linked data, denoted linked data mashup, is a critical requirement. Nonetheless, the quality of linked data mashups highly depends on the quality of the data sources. In this sense, it is essential to analyze data source quality and to make this information explicit to consumers of such data. This paper introduces a fuzzy ontology to represent the quality of linked data source. Furthermore, the paper shows the applicability of the fuzzy ontology in the process of evaluating data source quality used to build linked data mashups
A Method to Screen, Assess, and Prepare Open Data for Use
Open data's value-creating capabilities and innovation potential are widely recognized, resulting in a notable increase in the number of published open data sources. A crucial challenge for companies intending to leverage open data is to identify suitable open datasets that support specific business scenarios and prepare these datasets for use. Researchers have developed several open data assessment techniques, but those are restricted in scope, do not consider the use context, and are not embedded in the complete set of activities required for open data consumption in enterprises. Therefore, our research aims to develop prescriptive knowledge in the form of a meaningful method to screen, assess, and prepare open data for use in an enterprise setting. Our findings complement existing open data assessment techniques by providing methodological guidance to prepare open data of uncertain quality for use in a value-adding and demand-oriented manner, enabled by knowledge graphs and linked data concepts. From an academic perspective, our research conceptualizes open data preparation as a purposeful and value-creating process
- âŠ