    A systematic literature review of open data quality in practice

    Context: The main objective of open data initiatives is to make information freely available through easily accessible mechanisms and facilitate exploitation. In practice openness should be accompanied with a certain level of trustwor- thiness or guarantees about the quality of data. Traditional data quality is a thoroughly researched field with several benchmarks and frameworks to grasp its dimensions. However, quality assessment in open data is a complicated process as it consists of stakeholders, evaluation of datasets as well as the publishing platform. Objective: In this work, we aim to identify and synthesize various features of open data quality approaches in practice. We applied thematic synthesis to identify the most relevant research problems and quality assessment methodologies. Method: We undertook a systematic literature review to summarize the state of the art on open data quality. The review process starts by developing the review protocol in which all steps, research questions, inclusion and exclusion criteria and analysis procedures are included. The search strategy retrieved 9323 publications from four scientific digital libraries. The selected papers were published between 2005 and 2015. Finally, through a discussion between the authors, 63 paper were included in the final set of selected papers. Results: Open data quality, in general, is a broad concept, and it could apply to multiple areas. There are many quality issues concerning open data hindering their actual usage for real-world applications. The main ones are unstruc- tured metadata, heterogeneity of data formats, lack of accuracy, incompleteness and lack of validation techniques. Furthermore, we collected the existing quality methodologies from selected papers and synthesized under a unifying classification schema. Also, a list of quality dimensions and metrics from selected paper is reported. Conclusion: In this research, we provided an overview of the methods related to open data quality, using the instru- ment of systematic literature reviews. Open data quality methodologies vary depending on the application domain. Moreover, the majority of studies focus on satisfying specific quality criteria. With metrics based on generalized data attributes a platform can be created to evaluate all possible open dataset. Also, the lack of methodology validation remains a major problem. Studies should focus on validation techniques

    OGDPub: uma ontologia para publicação de dados abertos governamentais

    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Engenharia e Gestão do Conhecimento, Florianópolis, 2017Embora um número significante de agências governamentais vem aderindo ao movimento de dados abertos, no Brasil os municípios enfrentam sérias dificuldades para se inserirem neste movimento. A maior parte dos municípios não disponibilizam seus dados brutos, apenas relatórios com informações já processadas. Observa-se ainda uma considerável dificuldade para encontrar estes dados na Web e, quando encontrados, não é tarefa simples compreendê-los. Se os dados não são encontrados ou não são compreendidos, tornam-se subutilizados. Neste sentido, a presente pesquisa propõe uma ontologia de domínio (OGDPub) que apoie a publicação de dados abertos governamentais publicados por municípios brasileiros. A ontologia proposta fornece um arcabouço de metadados para descrição dos datasets, permite que a estrutura organizacional do município seja representada e propõe uma classificação dos datasets em uma linguagem compreensível ao cidadão. Espera-se, com isso, que estes dados sejam encontrados mais facilmente na Web, que sua compreensão seja mais simples e, por fim, que seja dada proveniência aos datasets. A verificação da OGDPub se deu em duas etapas, a saber: (1) instanciação de datasets reais de uma cidade brasileira na ontologia e (2) realização de consultas SPARQL simulando buscas realizadas por usuários. Por fim, acredita-se que a OGDPub colabore para que os dados governamentais oriundos de municípios brasileiros sejam disponibilizados ao público em formato aberto e que seu uso seja facilitado.Abstract: Despite a significant number of government agencies have been adhering to the Open Data Movement, in Brazil, a lot of municipalities face serious difficulties in joining the movement. Most municipalities do not provide their raw data, only reports with information already processed. It is still observed the difficulty of finding these data on the Web and, when it is found, it is not a simple task to understand them. If the data is not found or it is not understood, it becomes underused. In this sense, the presente research proposes a domain ontology (OGDPub) that supports the publication of open government data published by Brazilian municipalities. The proposed ontology provides a metadata framework to describing datasets, allows the organizational structure of the municipality to be represented and proposes a classification schema of datasets in a language understandable to citizen. It is expected, therefore, that the data will be found more easily on the Web, that its understanding will be simpler, besides giving provenance to the data. The verification of OGDPub occurred in two stages: (1) instantiation of real datasets of a Brazilian city in the ontology and (2) realization of SPARQL queries simulating searches performed by users. Finally, it is believed that OGDPub collaborates so that government data from Brazilian municipalities are made available to the public in an open format and its use is facilitated