3,624,738 research outputs found

    Open Data Quality

    Full text link
    The research discusses how (open) data quality could be described, what should be considered developing a data quality management solution and how it could be applied to open data to check its quality. The proposed approach focuses on development of data quality specification which can be executed to get data quality evaluation results, find errors in data and possible problems which must be solved. The proposed approach is applied to several open data sets to evaluate their quality. Open data is very popular, free available for every stakeholder - it is often used to make business decisions. It is important to be sure that this data is trustable and error-free as its quality problems can lead to huge losses.Comment: 10 pages, 3 figures, 13th International Baltic Conference on Databases and Information Systems & The Baltic DB&IS 2018 Doctoral Consortium (Baltic DB&IS 2018) At: Lithuania, Trakai, Volume: 2158. arXiv admin note: substantial text overlap with arXiv:2007.0469

    Quality Assessment of Linked Datasets using Probabilistic Approximation

    Full text link
    With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding

    Open Data Quality Measurement Framework: Definition and Application to Open Government Data

    Get PDF
    The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from practitioners shows that disclosing data without proper quality control may jeopardize datasets reuse and negatively affect civic participation. Current approaches to the problem in literature lack of a comprehensive theoretical framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets. In this work, we address these two limitations and set up a framework of indicators to measure the quality of Open Government Data on a series of data quality dimensions at most granular level of measurement. We validated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of OGD from decentralized data disclosure (municipalities level), with no possibility of extensive quality controls as in the former case, hence with supposed lower quality. Starting from measurements based on the quality framework, we were able to verify the difference in quality: the measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, addressing specific quality aspects

    Open Data Quality Evaluation: A Comparative Analysis of Open Data in Latvia

    Full text link
    Nowadays open data is entering the mainstream - it is free available for every stakeholder and is often used in business decision-making. It is important to be sure data is trustable and error-free as its quality problems can lead to huge losses. The research discusses how (open) data quality could be assessed. It also covers main points which should be considered developing a data quality management solution. One specific approach is applied to several Latvian open data sets. The research provides a step-by-step open data sets analysis guide and summarizes its results. It is also shown there could exist differences in data quality depending on data supplier (centralized and decentralized data releases) and, unfortunately, trustable data supplier cannot guarantee data quality problems absence. There are also underlined common data quality problems detected not only in Latvian open data but also in open data of 3 European countries.Comment: 24 pages, 2 tables, 3 figures, Baltic J. Modern Computin

    Challenges of Open Data Quality : More Than Just License, Format, and Customer Support

    Get PDF
    The research described here was supported by the award made by the RCUK Digital Economy programme to the dot.rural Digital Economy Hub, award reference: EP/G066051/1; and by the Innovate UK award reference: 102615.Peer reviewedPostprin

    Preliminary results on Ontology-based Open Data Publishing

    Get PDF
    Despite the current interest in Open Data publishing, a formal and comprehensive methodology supporting an organization in deciding which data to publish and carrying out precise procedures for publishing high-quality data, is still missing. In this paper we argue that the Ontology-based Data Management paradigm can provide a formal basis for a principled approach to publish high quality, semantically annotated Open Data. We describe two main approaches to using an ontology for this endeavor, and then we present some technical results on one of the approaches, called bottom-up, where the specification of the data to be published is given in terms of the sources, and specific techniques allow deriving suitable annotations for interpreting the published data under the light of the ontology

    Quality of metadata in open data portals

    Get PDF
    During the last decade, numerous governmental, educational or cultural institutions have launched Open Data initiatives that have facilitated the access to large volumes of datasets on the web. The main way to disseminate this availability of data has been the deployment of Open Data catalogs exposing metadata of these datasets, which are easily indexed by web search engines. Open Source platforms have facilitated enormously the labor of institutions involved in Open Data initiatives, making the setup of Open Data portals almost a trivial task. However, few approaches have analyzed how precisely metadata describes the associated datasets. Taking into account the existing approaches for analyzing the quality of metadata in the Open Data context and other related domains, this work contributes to the state of the art by extending an ISO 19157 based method for checking the quality of geographic metadata to the context of Open Data metadata. Focusing on metadata models compliant with the Data Catalog Vocabulary proposed by W3C, the proposed extended method has been applied for the evaluation of the Open Data catalog of the Spanish Government. The results have been also compared with those obtained by the Metadata Quality Assessment methodology proposed at the European Data Portal
    • ā€¦
    corecore