8 research outputs found

    Towards Cleaning-up Open Data Portals: A Metadata Reconciliation Approach

    Full text link
    This paper presents an approach for metadata reconciliation, curation and linking for Open Governamental Data Portals (ODPs). ODPs have been lately the standard solution for governments willing to put their public data available for the society. Portal managers use several types of metadata to organize the datasets, one of the most important ones being the tags. However, the tagging process is subject to many problems, such as synonyms, ambiguity or incoherence, among others. As our empiric analysis of ODPs shows, these issues are currently prevalent in most ODPs and effectively hinders the reuse of Open Data. In order to address these problems, we develop and implement an approach for tag reconciliation in Open Data Portals, encompassing local actions related to individual portals, and global actions for adding a semantic metadata layer above individual portals. The local part aims to enhance the quality of tags in a single portal, and the global part is meant to interlink ODPs by establishing relations between tags.Comment: 8 pages,10 Figures - Under Revision for ICSC201

    GENERATING KNOWLEDGE STRUCTURES FROM OPEN DATASETS' TAGS - AN APPROACH BASED ON FORMAL CONCEPT ANALYSIS

    Get PDF
    Under influence of data transparency initiatives, a variety of institutions have published a significant number of datasets. In most cases, data publishers take advantage of open data portals (ODPs) for making their datasets publicly available. To improve the datasets' discoverability, open data portals (ODPs) group open datasets into categories using various criteria like publishers, institutions, formats, and descriptions. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata may be missing, or may be incomplete or redundant. Each of these situations makes it difficult for users to find appropriate datasets and obtain the desired information. As the number of available datasets grows, this problem becomes easy to notice. This paper is focused on the first step towards decreasing this problem by implementing knowledge structures to be used in situations where a part of datasets' metadata is missing. In particular, we focus on developing knowledge structures capable of suggesting the best match for the category where an uncategorized dataset should belong to. Our approach relies on dataset descriptions provided by users within dataset tags. We take advantage of a formal concept analysis to reveal the shared conceptualization originating from the tags' usage by developing a concept lattice per each category of open datasets. Since tags represent free text metadata entered by users, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Finally, we will demonstrate the advantage of our proposal by comparing concept lattices generated using formal the concept analysis before and after the optimization process. The main experimental research results will show that our approach is capable of reducing the number of nodes within a lattice more than 40%

    Enriching Linked Data with Semantics from Domain-Specific Diagrammatic Models

    Get PDF
    One key driver of the Linked Data paradigm is the ability to lift data graphs from legacy systems by employing various adapters and RDFizers (e.g., D2RQ for relational databases, XLWrap for spreadsheets). Such approaches aim towards removing boundaries of enterprise data silos by opening them to cross-organizational linking within a “Web of Data”. An insufficiently tapped source of machine-readable semantics is the underlying graph nature of diagrammatic conceptual models – a kind of information that is richer compared to what is typically lifted from table schemata, especially when a domain-specific modeling language is employed. The paper advocates an approach to Linked Data enrichment based on a diagrammatic model RDFizer originally developed in the context of the ComVantage FP7 research project. A minimal but illustrative example is provided from which arguments will be generalized, leading to a proposed vision of “conceptual model”-aware information systems

    Usability of the G7 open government data portals and lessons learned

    Get PDF
    Recent advances in technology have made truly open and accessible government significantly more realisable. One of the ways in which governments are using this technology is in the implementation of online portals that allow open (i.e., public and unrestricted) access to data and use of data. Such portals can be used by citizens and professionals to facilitate improved decision-making across a wide range of areas, from car-parking to promoting entrepreneurialism. However, the existence of portals per se is not enough. To maximise their potential, users must also feel that they are both accessible and usable. To gain insights into the current state of usability of OGD portals for professionals working in data-related areas, a comparative study of the portals of the G7 group was carried out, using a mixed methodology. This is the first specific comparison of these portals for such users, as well as the first study to add a user-centred qualitative dimension to the research. The study’s findings showed that the G7 countries are not maximising the potential of their portals or collaborating effectively. Addressing these issues, and building better cross-national consistency, would help to improve the value delivered by investment in OGD portals. The study also further supported an existing user-centred, heuristic evaluation framework for application to a more specific user group, as well as more generally

    A systematic literature review of open data quality in practice

    Get PDF
    Context: The main objective of open data initiatives is to make information freely available through easily accessible mechanisms and facilitate exploitation. In practice openness should be accompanied with a certain level of trustwor- thiness or guarantees about the quality of data. Traditional data quality is a thoroughly researched field with several benchmarks and frameworks to grasp its dimensions. However, quality assessment in open data is a complicated process as it consists of stakeholders, evaluation of datasets as well as the publishing platform. Objective: In this work, we aim to identify and synthesize various features of open data quality approaches in practice. We applied thematic synthesis to identify the most relevant research problems and quality assessment methodologies. Method: We undertook a systematic literature review to summarize the state of the art on open data quality. The review process starts by developing the review protocol in which all steps, research questions, inclusion and exclusion criteria and analysis procedures are included. The search strategy retrieved 9323 publications from four scientific digital libraries. The selected papers were published between 2005 and 2015. Finally, through a discussion between the authors, 63 paper were included in the final set of selected papers. Results: Open data quality, in general, is a broad concept, and it could apply to multiple areas. There are many quality issues concerning open data hindering their actual usage for real-world applications. The main ones are unstruc- tured metadata, heterogeneity of data formats, lack of accuracy, incompleteness and lack of validation techniques. Furthermore, we collected the existing quality methodologies from selected papers and synthesized under a unifying classification schema. Also, a list of quality dimensions and metrics from selected paper is reported. Conclusion: In this research, we provided an overview of the methods related to open data quality, using the instru- ment of systematic literature reviews. Open data quality methodologies vary depending on the application domain. Moreover, the majority of studies focus on satisfying specific quality criteria. With metrics based on generalized data attributes a platform can be created to evaluate all possible open dataset. Also, the lack of methodology validation remains a major problem. Studies should focus on validation techniques

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    Analyse und Konzeptentwicklung zur Einbindung von Visualisierungen innerhalb eines Open Data Portals am Beispiel von Data.Europa.eu

    Get PDF
    Die Offenlegung von Verwaltungsdaten ist ein wichtiger Bestandteil der in den letzten Jahren immer mehr an Bedeutung gewinnenden Open Government Initiative, die zum Ziel hat, die Transparenz, Offenheit und Beteiligungsmöglichkeiten in Regierungs- und Verwaltungshandeln zu erhöhen. Zur Stärkung des Vertrauens zwischen politischen Institutionen und Bürger*innen, Medien, Wissenschaft und Wirtschaft werden offene Regierungsdaten in Open Data Portalen zur freien Wiederverwendung veröffentlicht. Unter der Annahme, dass die alleinige Bereitstellung der offenen Daten nicht automatisch in eine hochfrequente Nutzung selbiger resultiert, wird basierend auf dem aktuellen Stand der Forschung hinsichtlich der Nutzer*innenfreundlichkeit von Open Data Portalen die These aufgegriffen, dass die Bereitstellung von Visualisierungen die Wiederverwendbarkeit von offenen Daten erhöht. Die vorliegende Arbeit geht daher der Frage nach, wie Visualisierungen in ein Open Data Portal eingebunden werden können. Zur Beantwortung dieser Frage wird am Beispiel des Open Data Portals Data.Europa.eu eine Analyse der bestehenden Implementierung sowie die Untersuchung von drei Best-Practice Portalen hinsichtlich ihrer Umsetzung von Visualisierungen vorgenommen. Aufbauend auf der Analyse wird ein Konzept zur Einbindung von Visualisierungen entwickelt. Das entwickelte Konzept wird anschließend hinsichtlich der Nutzer*innenfreundlichkeit mithilfe eines Usability-Tests evaluiert. Die Ergebnisse zeigen auf, dass die Einbindung von Visualisierungen das Verständnis der vorhandenen Daten und somit die Wiederverwendbarkeit unterstützt.The disclosure of administrative data is an important component of the increasingly significant Open Government initiative that aims to enhance transparency, openness, and opportunities for participation in governmental and administrative actions. Open government data is published in open data portals for free reuse to strengthen trust between political institutions and citizens, media, academia, and business. Assuming that the mere provision of open data does not automatically result in its high-frequency utilization, this work addresses the thesis based on the current state of research regarding the user-friendliness of open data portals, proposing that the provision of visualizations increases the reusability of open data. Consequently, this study investigates how visualizations can be integrated into an open data portal, using the example of the data.europa.eu open data portal. An analysis of the existing implementation of the portal and an examination of three best-practice portals in terms of their visualization implementation are conducted to answer this question. Building upon the analysis, a concept for integrating visualizations is developed. Subsequently, the developed concept is evaluated for user-friendliness through a usability test. The results indicate that the integration of visualizations supports the understanding of the available data and thereby enhances its reusability

    Strategies and Approaches for Exploiting the Value of Open Data

    Get PDF
    Data is increasingly permeating into all dimensions of our society and has become an indispensable commodity that serves as a basis for many products and services. Traditional sectors, such as health, transport, retail, are all benefiting from digital developments. In recent years, governments have also started to participate in the open data venture, usually with the motivation of increasing transparency. In fact, governments are one of the largest producers and collectors of data in many different domains. As the increasing amount of open data and open government data initiatives show, it is becoming more and more vital to identify the means and methods how to exploit the value of this data that ultimately affects various dimensions. In this thesis we therefore focus on researching how open data can be exploited to its highest value potential, and how we can enable stakeholders to create value upon data accordingly. Albeit the radical advances in technology enabling data and knowledge sharing, and the lowering of barriers to information access, raw data was given only recently the attention and relevance it merits. Moreover, even though the publishing of data is increasing at an enormously fast rate, there are many challenges that hinder its exploitation and consumption. Technical issues hinder the re-use of data, whilst policy, economic, organisational and cultural issues hinder entities from participating or collaborating in open data initiatives. Our focus is thus to contribute to the topic by researching current approaches towards the use of open data. We explore methods for creating value upon open (government) data, and identify the strengths and weaknesses that subsequently influence the success of an open data initiative. This research then acts as a baseline for the value creation guidelines, methodologies, and approaches that we propose. Our contribution is based on the premise that if stakeholders are provided with adequate means and models to follow, then they will be encouraged to create value and exploit data products. Our subsequent contribution in this thesis therefore enables stakeholders to easily access and consume open data, as the first step towards creating value. Thereafter we proceed to identify and model the various value creation processes through the definition of a Data Value Network, and also provide a concrete implementation that allows stakeholders to create value. Ultimately, by creating value on data products, stakeholders participate in the global data economy and impact not only the economic dimension, but also other dimensions including technical, societal and political
    corecore