10 research outputs found

    ProDataMarket: A data marketplace for monetizing linked data

    Get PDF
    Linked data has emerged as an interesting technology for Publishing structured data on the Web but also as a powerful mechanism for integrating disparate data sources. Various tools and approaches have been developed in the semantic Web community to produce and consume linked data, however little attention has been paid to monetization of linked data. In this paper we introduce a data marketplace – proDataMarket – that enables data providers to generate, advertise, and sell linked data, and data consumers to purchase linked data on the marketplace. The marketplace was originally designed with a focus on geospatial linked data (targeting property-related data providers and consumers) but its capabilities are generic and can be used for data in various domains. This demo will highlight the capabilities offered to the providers and consumers of the data made available on the marketplace.publishedVersio

    Tabular Data Cleaning and Linked Data Generation with Grafterizer

    No full text
    The volume of data being published on the Web and made available as Open Data has significantly increased over the last several years. However, data published by independent publishers are sliced and fragmented. Creating descriptive connections across datasets may considerably enrich data and extend their value. One way to standardize, describe and interconnect the information from heterogeneous data sources is to use Linked Data as a publishing technology. The majority of published open datasets is in a tabular format and the process of generating valid Linked Data from them requires powerful and flexible methods for data cleaning, preparation, and transformation. Most of the time and effort of data workers and data developers is concentrated on data cleaning aspects. In spite of the number of available platforms for tabular data cleaning and preparation, no solution is focused on the Linked Data generation. This thesis explores approaches for data cleaning and transformation in the context of the Linked Data generation and identifies their challenges. This includes reviewing typical tabular data quality issues found in the literature and practical use cases and their categorization in order to produce the requirements on designing a solution in the form of the set of data cleaning and transformation operations. Furthermore, the thesis introduces the Grafterizer software framework, developed to assist data workers and data developers in preparing and converting raw tabular data to Linked Data with simplifying and partially automating this process. The Grafterizer framework is evaluated against existing relevant tools and systems for data cleaning. The contribution of the thesis also includes extending and evaluating reference software system to implement the needed data cleaning and transformation operations. This resulted in a powerful framework for addressing typical data quality issues and a wide range of supported data cleaning and transformation operations

    Tabular Data Anomaly Patterns

    Get PDF
    One essential and challenging task in data science is data cleaning - the process of identifying and eliminating data anomalies. Different data types, data domains, data acquisition methods, and final purposes of data cleaning have resulted in different approaches in defining data anomalies in the literature. This paper proposes and describes a set of basic data anomalies in the form of anomaly patterns commonly encountered in tabular data, independently of the data domain, data acquisition technique, or the purpose of data cleaning. This set of anomalies can serve as a valuable basis for developing and enhancing software products that provide general-purpose data cleaning facilities and can provide a basis for comparing different tools aimed to support tabular data cleaning capabilities. Furthermore, this paper introduces a set of corresponding data operations suitable for addressing the identified anomaly patterns and introduces Grafterizer - a software framework that implements those data operationsacceptedVersio

    Norwegian State of estate report as linked open data

    Get PDF
    This paper presents the Norwegian State of Estate (SoE) dataset containing data about real estates owned by the central government in Norway. The dataset is produced by integrating cross-domain government datasets including data from sources such as the Norwegian business entity register, cadastral system, building accessibility register and the previous SoE report. The dataset is made available as Linked Data. The Linked Data generation process includes data acquisition, cleaning, transformation, annotation, publishing, augmentation and interlinking the annotated data as well as quality assessment of the interlinked datasets. The dataset is published under the Norwegian License for Open Government Data (NLOD) and serves as a reference point for applications using data on central government real estates, such as generation of the SoE report, searching properties suitable for asylum reception centres, risk assessment for state-owned buildings or a public building application for visitors.acceptedVersio

    Linked data for common agriculture policy: Enabling semantic querying over sentinel-2 and LiDAR data

    Get PDF
    The amount of open and free satellite earth observation data combined with available data from other sectors (e.g. biodiversity, landscape elements, cadaster data) has the potential to enhance decisionmaking processes in various domains. An example of such a domain is agriculture, where the ability to objectively and automatically identify dfferent types of agricultural features (e.g., irrigation patterns and landscape elements) can lead to more effective agriculture management. In this paper we show the possibility to publish and integrate multi-sectoral data from several sources into an existing data-intensive service targeting better and fairer Common Agriculture Policy (CAP) funds assignments to farmers and land owners. We show an end-to-end approach for integrating multi-sectoral data and publishing the result as Linked Data with the help of the DataGraft platform. To demonstrate the use of the resulted dataset, we developed a visualization system prototype showing various information about agricultural parcel features.publishedVersio

    The InfraRisk ontology: enabling semantic interoperability for critical infrastructures at risk from natural hazards

    Get PDF
    Earthquakes, landslides, and other natural hazard events have severe negative socio-economic impacts. Among other consequences, those events can cause damage to infrastructure networks such as roads and railways. Novel methodologies and tools are needed to analyse the potential impacts of extreme natural hazard events and aid in the decision-making process regarding the protection of existing critical road and rail infrastructure as well as the development of new infrastructure. Enabling uniform, integrated, and reliable access to data on historical failures of critical transport infrastructure can help infrastructure managers and scientist from various related areas to better understand, prevent, and mitigate the impact of natural hazards on critical infrastructures. This paper describes the construction of the InfraRisk ontology for representing relevant information about natural hazard events and their impact on infrastructure components. Furthermore, we present a software prototype that visualizes data published using the proposed ontology.acceptedVersio

    Linked data for the Norwegian state of estate reporting service

    Get PDF
    The Norwegian State of Estate (SoE) report includes information about all Norwegian state-owned properties and buildings in the public sector and aims to assist government decision makers to allocate resources more effectively. A Linked Data based approach is presented here to increase the transparency in the government administration, improve the report generating process and also the report quality. Cross- domain government data originated from the business entity register, the cadastral system, the building accessibility register and the old SoE report are acquired, prepared, cleaned, transformed to Linked Data format and published. The source datasets are then integrated, augmented and interlinked before the results are published as a SPARQL endpoint, used for data visualization and report generation.publishedVersio

    Interacting with subterranean infrastructure linked data using augmented reality

    Get PDF
    Subterranean infrastructure damages caused by excavation works of all kinds are costly and potentially dangerous for workers. Such damages are often caused by poor subterranean data or inappropriate use of the existing data. We aim to provide solutions and services that will hinder obstacles related to the use of subterranean infrastructure data to ensure less damage and less time spent on finding and integrating data about subterranean infrastructure. The result of the work reported in this paper is an augmented reality application that can provide users the ability to see what subterranean infrastructure is located at a given physical location. In this paper we demonstrate a method to create such an application using Linked Data technologies.publishedVersio

    DataGraft beta v2: New features and capabilities

    Get PDF
    In this demonstrator, we will introduce the latest features and capabil-ities added to DataGraft – a Data-as-a-Service platform for data preparation and knowledge graph generation. DataGraft provides data transformation, publishing and hosting capabilities that aim to simplify the data publishing lifecycle for data workers (i.e., Open Data publishers, Linked Data developers, data scientists). This demonstrator highlights the recent features added to DataGraft by exempli-fying data publication of statistical data – going from the raw data published at a public portal to published and accessible Linked Data with the help of the tools and features of the platform.publishedVersio

    Publishing socio-economic territory indices as linked data and their visualization for real estate valuation

    No full text
    The correct estimation of the real estate value facilitates decision making in various sectors, such as Public administration or the real estate market. In this paper we demonstrate a method to manage territory scores and property valuation estimations as Linked Data With the help of the proDataMarket technical framework. The demo illustrates how the proDataMarket technical framework can be used to generate, maintain and serve territory and property valuation estimation data With the help of semantic technologies.publishedVersio
    corecore