9 research outputs found

    ARDI: automatic generation of RDFS models from heterogeneous data sources

    Get PDF
    The current wealth of information, typically known as Big Data, generates a large amount of available data for organisations. Data Integration provides foundations to query disparate data sources as if they were integrated into a single source. However, current data integration tools are far from being useful for most organisations due to the heterogeneous nature of data sources, which represents a challenge for current frameworks. To enable data integration of highly heterogeneous and disparate data sources, this paper proposes a method to extract the schema from semi-structured (such as JSON and XML) and structured (such as relational) data sources, and generate an equivalent RDFS representation. The output of our method complements current frameworks and reduces the manual workload required to represent the input data sources in terms of the integration canonical data model. Our approach consists of production rules at the meta-model level that guarantee the correctness of the model translations. Finally, a tool for implementing our approach has been developed.Peer ReviewedPostprint (author's final draft

    Linked Conservation Data: Driving Change in Documentation Practice

    Full text link
    Documentation is a core task for conservators, allowing evaluation of past choices and providing an evidence base for reasoned decision-making for future practice. However, much of the documentation created is not shared with other conservators or broader audiences. During the Linked Conservation Data (LCD) project, we explored the potential of documentation practices known as Linked Data for conservation, inspired by practices in other domains including medical science and biology, as well as various openGLAM initiatives. As part of the project, we developed: guidelines for harmonising disparate conservation terminologies; proposals for encoding different types of conservation data; a template for articulating policy in relation to conservation data; and a Linked Data pilot demonstrating the value of the approach. This work encourages institutions to begin sharing conservation records routinely, for use and re-use by other conservators. Adopting such a practice at large scale will provide an invaluable resource of conservation-related information that can be used for decision-making and enable data analysis and statistical work with large samples in conservation. We present conclusions and lessons learned from the LCD pilot, including the: importance of structured records; role of documentation of conservation vocabularies; foundational work still needed for sharing records as Linked Data; and practicalities of implementing a Linked Data system for sharing conservation records. We conclude by outlining the role and responsibilities that professional bodies need to adopt towards this effort

    RDF graph summarization: principles, techniques and applications (tutorial)

    Get PDF
    International audienceThe explosion in the amount of the RDF on the Web has lead to the need to explore, query and understand such data sources. The task is challenging due to the complex and heterogeneous structure of RDF graphs which, unlike relational databases, do not come with a structure-dictating schema. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; the summarization goal, and the main computational tools employed for summarizing graphs, are the main factors behind this diversity. This tutorial presents a structured analysis and comparison existing works in the area of RDF summarization; it is based upon a recent survey which we co-authored with colleagues [3]. We present the concepts at the core of each approach, outline their main technical aspects and implementation. We conclude by identifying the most pertinent summarization method for different usage scenarios, and discussing areas where future effort is needed

    Building an Integrated Enhanced Virtual Research Environment Metadata Catalogue

    Get PDF
    Purpose The purpose of this paper is to boost multidisciplinary research by the building of an integrated catalogue or research assets metadata. Such an integrated catalogue should enable researchers to solve problems or analyse phenomena that require a view across several scientific domains. Design/methodology/approach There are two main approaches for integrating metadata catalogues provided by different e-science research infrastructures (e-RIs): centralised and distributed. The authors decided to implement a central metadata catalogue that describes, provides access to and records actions on the assets of a number of e-RIs participating in the system. The authors chose the CERIF data model for description of assets available via the integrated catalogue. Analysis of popular metadata formats used in e-RIs has been conducted, and mappings between popular formats and the CERIF data model have been defined using an XML-based tool for description and automatic execution of mappings. Findings An integrated catalogue of research assets metadata has been created. Metadata from e-RIs supporting Dublin Core, ISO 19139, DCAT-AP, EPOS-DCAT-AP, OIL-E and CKAN formats can be integrated into the catalogue. Metadata are stored in CERIF RDF in the integrated catalogue. A web portal for searching this catalogue has been implemented. Research limitations/implications Only five formats are supported at this moment. However, description of mappings between other source formats and the target CERIF format can be defined in the future using the 3M tool, an XML-based tool for describing X3ML mappings that can then be automatically executed on XML metadata records. The approach and best practices described in this paper can thus be applied in future mappings between other metadata formats. Practical implications The integrated catalogue is a part of the eVRE prototype, which is a result of the VRE4EIC H2020 project. Social implications The integrated catalogue should boost the performance of multi-disciplinary research; thus it has the potential to enhance the practice of data science and so contribute to an increasingly knowledge-based society. Originality/value A novel approach for creation of the integrated catalogue has been defined and implemented. The approach includes definition of mappings between various formats. Defined mappings are effective and shareable.Published929-9514IT. Banche datiJCR Journa

    Logros de la iniciativa ARIADNE para el intercambio de datos e investigación arqueológica

    Get PDF
    El objetivo general de la iniciativa ARIADNE es ayudar a las comunidades de investigación y gestión de datos arqueológicos en Europa y más allá, compartir y utilizar de manera más efectiva los datos dispersos en muchas instituciones y proyectos. La iniciativa desarrolló servicios de Infraestructura de Investigación que permiten la agregación, integración, búsqueda y visualización de registros de datos que describen y enlazan a colecciones de datos y elementos disponibles en los repositorios y bases de datos de los proveedores. Financiado bajo la rama de Infraestructuras de Investigación del Programa Marco de Investigación e Innovación de la Unión Europea, los proyectos ARIADNE implementaron y mejoraron la Infraestructura de Investigación ARIADNE y movilizaron una creciente comunidad de instituciones y proyectos colaborativos interesados en compartir datos a través de la e-Infraestructura. En el proyecto ARIADNEplus, se integraron casi 4 millones de registros de datos en el Portal ARIADNE. Después de una breve introducción a la iniciativa ARIADNE, este documento presenta algunos logros seleccionados de la iniciativa con el proyecto ARIADNEplus. Aborda la extensión y el apoyo de la comunidad ARIADNE, las actividades que promueven datos FAIR en arqueología y la estandarización de conjuntos de datos basados en el CIDOC CRM y los vocabularios de dominio Getty AAT y PeriodO. Considera el Portal ARIADNE como una herramienta efectiva de acceso a datos e investigación, y el desarrollo de Entornos Virtuales de Investigación como un nuevo enfoque innovador. Las observaciones finales destacan que la iniciativa ARIADNE proporciona incentivos para que las instituciones y proyectos compartan sus datos y los hagan útiles a través del Portal ARIADNE, lo que potencia el valor de los repositorios y bases de datos de los proveedores. Además, se señalan las formas en que ARIADNE ha fomentado una labor interdisciplinaria fecunda, por ejemplo, entre académicos y desarrolladores tecnológicos de servicios de investigación

    X3ML mapping framework for information integration in cultural heritage and beyond

    No full text
    The aggregation of heterogeneous data from different institutions in cultural heritage and e-science has the potential to create rich data resources useful for a range of different purposes, from research to education and public interests. In this paper, we present the X3ML framework, a framework for information integration that handles effectively and efficiently the steps involved in schema mapping, uniform resource identifier (URI) definition and generation, data transformation, provision and aggregation. The framework is based on the X3ML mapping definition language for describing both schema mappings and URI generation policies and has a lot of advantages when compared with other relevant frameworks. We describe the architecture of the framework as well as details on the various available components. Usability aspects are discussed and performance metrics are demonstrated. The high impact of our work is verified via the increasing number of international projects that adopt and use this framework

    Towards Interoperable Research Infrastructures for Environmental and Earth Sciences

    Get PDF
    This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions
    corecore