3,626,753 research outputs found
Open Data Quality
The research discusses how (open) data quality could be described, what
should be considered developing a data quality management solution and how it
could be applied to open data to check its quality. The proposed approach
focuses on development of data quality specification which can be executed to
get data quality evaluation results, find errors in data and possible problems
which must be solved. The proposed approach is applied to several open data
sets to evaluate their quality. Open data is very popular, free available for
every stakeholder - it is often used to make business decisions. It is
important to be sure that this data is trustable and error-free as its quality
problems can lead to huge losses.Comment: 10 pages, 3 figures, 13th International Baltic Conference on
Databases and Information Systems & The Baltic DB&IS 2018 Doctoral Consortium
(Baltic DB&IS 2018) At: Lithuania, Trakai, Volume: 2158. arXiv admin note:
substantial text overlap with arXiv:2007.0469
Recommended from our members
Panel data and open-ended questions: Understanding perceptions of quality of life
This paper describes the burgeoning interest in quality of life studies and suggests that as well as
expert definitions, we need to consider peopleās own perceptions of what matters. Using openended
questions from the 1997 and 2002 waves of the British Household Panel Survey we
analyse both quantitatively and qualitatively how perceptions of quality of life differ for men and
women across the life course. Qualitative analysis reveals that key domains such as health, family
and finances often refer, not to self, but to others. Longitudinal analysis demonstrates that
peopleās perceptions of quality of life change over time, particularly before and after important
life transitions. Thus our findings challenge overly individualistic and static conceptions of quality
of life and reveal quality of life as a process, not a fixed state
Quality Assessment of Linked Datasets using Probabilistic Approximation
With the increasing application of Linked Open Data, assessing the quality of
datasets by computing quality metrics becomes an issue of crucial importance.
For large and evolving datasets, an exact, deterministic computation of the
quality metrics is too time consuming or expensive. We employ probabilistic
techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient
estimation for implementing a broad set of data quality metrics in an
approximate but sufficiently accurate way. Our implementation is integrated in
the comprehensive data quality assessment framework Luzzu. We evaluated its
performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding
Open Data Quality Measurement Framework: Definition and Application to Open Government Data
The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from practitioners shows that disclosing data without proper quality control may jeopardize datasets reuse and negatively affect civic participation. Current approaches to the problem in literature lack of a comprehensive theoretical framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets.
In this work, we address these two limitations and set up a framework of indicators to measure the quality of Open Government Data on a series of data quality dimensions at most granular level of measurement. We validated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of OGD from decentralized data disclosure (municipalities level), with no possibility of extensive quality controls as in the former case, hence with supposed lower quality.
Starting from measurements based on the quality framework, we were able to verify the difference in quality: the measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, addressing specific quality aspects
Open Data Quality Evaluation: A Comparative Analysis of Open Data in Latvia
Nowadays open data is entering the mainstream - it is free available for
every stakeholder and is often used in business decision-making. It is
important to be sure data is trustable and error-free as its quality problems
can lead to huge losses. The research discusses how (open) data quality could
be assessed. It also covers main points which should be considered developing a
data quality management solution. One specific approach is applied to several
Latvian open data sets. The research provides a step-by-step open data sets
analysis guide and summarizes its results. It is also shown there could exist
differences in data quality depending on data supplier (centralized and
decentralized data releases) and, unfortunately, trustable data supplier cannot
guarantee data quality problems absence. There are also underlined common data
quality problems detected not only in Latvian open data but also in open data
of 3 European countries.Comment: 24 pages, 2 tables, 3 figures, Baltic J. Modern Computin
Challenges of Open Data Quality : More Than Just License, Format, and Customer Support
The research described here was supported by the award made by the RCUK Digital Economy programme to the dot.rural Digital Economy Hub, award reference: EP/G066051/1; and by the Innovate UK award reference: 102615.Peer reviewedPostprin
Preliminary results on Ontology-based Open Data Publishing
Despite the current interest in Open Data publishing, a formal and
comprehensive methodology supporting an organization in deciding which data to
publish and carrying out precise procedures for publishing high-quality data,
is still missing. In this paper we argue that the Ontology-based Data
Management paradigm can provide a formal basis for a principled approach to
publish high quality, semantically annotated Open Data. We describe two main
approaches to using an ontology for this endeavor, and then we present some
technical results on one of the approaches, called bottom-up, where the
specification of the data to be published is given in terms of the sources, and
specific techniques allow deriving suitable annotations for interpreting the
published data under the light of the ontology
Quality of metadata in open data portals
During the last decade, numerous governmental, educational or cultural institutions have launched Open Data initiatives that have facilitated the access to large volumes of datasets on the web. The main way to disseminate this availability of data has been the deployment of Open Data catalogs exposing metadata of these datasets, which are easily indexed by web search engines. Open Source platforms have facilitated enormously the labor of institutions involved in Open Data initiatives, making the setup of Open Data portals almost a trivial task. However, few approaches have analyzed how precisely metadata describes the associated datasets. Taking into account the existing approaches for analyzing the quality of metadata in the Open Data context and other related domains, this work contributes to the state of the art by extending an ISO 19157 based method for checking the quality of geographic metadata to the context of Open Data metadata. Focusing on metadata models compliant with the Data Catalog Vocabulary proposed by W3C, the proposed extended method has been applied for the evaluation of the Open Data catalog of the Spanish Government. The results have been also compared with those obtained by the Metadata Quality Assessment methodology proposed at the European Data Portal
- ā¦