14 research outputs found

    Dataset search: a survey

    Get PDF
    Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference

    Strategies and Approaches for Exploiting the Value of Open Data

    Get PDF
    Data is increasingly permeating into all dimensions of our society and has become an indispensable commodity that serves as a basis for many products and services. Traditional sectors, such as health, transport, retail, are all benefiting from digital developments. In recent years, governments have also started to participate in the open data venture, usually with the motivation of increasing transparency. In fact, governments are one of the largest producers and collectors of data in many different domains. As the increasing amount of open data and open government data initiatives show, it is becoming more and more vital to identify the means and methods how to exploit the value of this data that ultimately affects various dimensions. In this thesis we therefore focus on researching how open data can be exploited to its highest value potential, and how we can enable stakeholders to create value upon data accordingly. Albeit the radical advances in technology enabling data and knowledge sharing, and the lowering of barriers to information access, raw data was given only recently the attention and relevance it merits. Moreover, even though the publishing of data is increasing at an enormously fast rate, there are many challenges that hinder its exploitation and consumption. Technical issues hinder the re-use of data, whilst policy, economic, organisational and cultural issues hinder entities from participating or collaborating in open data initiatives. Our focus is thus to contribute to the topic by researching current approaches towards the use of open data. We explore methods for creating value upon open (government) data, and identify the strengths and weaknesses that subsequently influence the success of an open data initiative. This research then acts as a baseline for the value creation guidelines, methodologies, and approaches that we propose. Our contribution is based on the premise that if stakeholders are provided with adequate means and models to follow, then they will be encouraged to create value and exploit data products. Our subsequent contribution in this thesis therefore enables stakeholders to easily access and consume open data, as the first step towards creating value. Thereafter we proceed to identify and model the various value creation processes through the definition of a Data Value Network, and also provide a concrete implementation that allows stakeholders to create value. Ultimately, by creating value on data products, stakeholders participate in the global data economy and impact not only the economic dimension, but also other dimensions including technical, societal and political

    Publishing transport data for maximum reuse

    Get PDF

    CLOUD-BASED SOLUTIONS IMPROVING TRANSPARENCY, OPENNESS AND EFFICIENCY OF OPEN GOVERNMENT DATA

    Get PDF
    A central pillar of open government programs is the disclosure of data held by public agencies using Information and Communication Technologies (ICT). This disclosure relies on the creation of open data portals (e.g. Data.gov) and has subsequently been associated with the expression Open Government Data (OGD). The overall goal of these governmental initiatives is not limited to enhance transparency of public sectors but aims to raise awareness of how released data can be put to use in order to enable the creation of new products and services by private sectors. Despite the usage of technological platforms to facilitate access to government data, open data portals continue to be organized in order to serve the goals of public agencies without opening the doors to public accountability, information transparency, public scrutiny, etc. This thesis considers the basic aspects of OGD including the definition of technical models for organizing such complex contexts, the identification of techniques for combining data from several portals and the proposal of user interfaces that focus on citizen-centred usability. In order to deal with the above issues, this thesis presents a holistic approach to OGD that aims to go beyond problems inherent their simple disclosure by providing a tentative answer to the following questions: 1) To what extent do the OGD-based applications contribute towards the creation of innovative, value-added services? 2) What technical solutions could increase the strength of this contribution? 3) Can Web 2.0 and Cloud technologies favour the development of OGD apps? 4) How should be designed a common framework for developing OGD apps that rely on multiple OGD portals and external web resources? In particular, this thesis is focused on devising computational environments that leverage the content of OGD portals (supporting the initial phase of data disclosure) for the creation of new services that add value to the original data. The thesis is organized as follows. In order to offer a general view about OGD, some important aspects about open data initiatives are presented including their state of art, the existing approaches for publishing and consuming OGD across web resources, and the factors shaping the value generated through government data portals. Then, an architectural framework is proposed that gathers OGD from multiple sites and supports the development of cloud-based apps that leverage these data according to potentially different exploitation roots ranging from traditional business to specialized supports for citizens. The proposed framework is validated by two cloud-based apps, namely ODMap (Open Data Mapping) and NESSIE (A Network-based Environment Supporting Spatial Information Exploration). In particular, ODMap supports citizens in searching and accessing OGD from several web sites. NESSIE organizes data captured from real estate agencies and public agencies (i.e. municipalities, cadastral offices and chambers of commerce) in order to provide citizens with a geographic representation of real estate offers and relevant statistics about the price trend.A central pillar of open government programs is the disclosure of data held by public agencies using Information and Communication Technologies (ICT). This disclosure relies on the creation of open data portals (e.g. Data.gov) and has subsequently been associated with the expression Open Government Data (OGD). The overall goal of these governmental initiatives is not limited to enhance transparency of public sectors but aims to raise awareness of how released data can be put to use in order to enable the creation of new products and services by private sectors. Despite the usage of technological platforms to facilitate access to government data, open data portals continue to be organized in order to serve the goals of public agencies without opening the doors to public accountability, information transparency, public scrutiny, etc. This thesis considers the basic aspects of OGD including the definition of technical models for organizing such complex contexts, the identification of techniques for combining data from several portals and the proposal of user interfaces that focus on citizen-centred usability. In order to deal with the above issues, this thesis presents a holistic approach to OGD that aims to go beyond problems inherent their simple disclosure by providing a tentative answer to the following questions: 1) To what extent do the OGD-based applications contribute towards the creation of innovative, value-added services? 2) What technical solutions could increase the strength of this contribution? 3) Can Web 2.0 and Cloud technologies favour the development of OGD apps? 4) How should be designed a common framework for developing OGD apps that rely on multiple OGD portals and external web resources? In particular, this thesis is focused on devising computational environments that leverage the content of OGD portals (supporting the initial phase of data disclosure) for the creation of new services that add value to the original data. The thesis is organized as follows. In order to offer a general view about OGD, some important aspects about open data initiatives are presented including their state of art, the existing approaches for publishing and consuming OGD across web resources, and the factors shaping the value generated through government data portals. Then, an architectural framework is proposed that gathers OGD from multiple sites and supports the development of cloud-based apps that leverage these data according to potentially different exploitation roots ranging from traditional business to specialized supports for citizens. The proposed framework is validated by two cloud-based apps, namely ODMap (Open Data Mapping) and NESSIE (A Network-based Environment Supporting Spatial Information Exploration). In particular, ODMap supports citizens in searching and accessing OGD from several web sites. NESSIE organizes data captured from real estate agencies and public agencies (i.e. municipalities, cadastral offices and chambers of commerce) in order to provide citizens with a geographic representation of real estate offers and relevant statistics about the price trend

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    Merging and enriching DCAT feeds to improve discoverability of datasets

    No full text
    Data Catalog Vocabulary (DCAT) is a W3C specification to describe datasets published on the Web. However, these catalogs are not easily discoverable based on a user's needs. In this paper, we introduce the Node.js module 'dcat-merger' which allows a user agent to download and semantically merge different DCAT feeds from the Web into one DCAT feed, which can be republished. Merging the input feeds is followed by enriching them. Besides determining the subjects of the datasets, using DBpedia Spotlight, two extensions were built: one categorizes the datasets according to a taxonomy, and the other adds spatial properties to the datasets. These extensions require the use of information available in DBpedia's SPARQL endpoint. However, public SPARQL endpoints often suffer from low availability, its Triple Pattern Fragments alternative is used. However, the need for DCAT Merger sparks the discussion for more high level functionality to improve a catalog's discoverability

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
    corecore