621 research outputs found

    Local Radiance

    Get PDF
    Recent years have seen a proliferation of web applications based on content management systems (CMS). Using a CMS, non-technical content authors are able to define custom content types to support their needs. These content type names and the attribute names in each content type are typically domain-specific and meaningful to the content authors. The ability of a CMS to support a multitude of content types allows for endless creation and customization but also leads to a large amount of heterogeneity within a single application. While this meaningful heterogeneity is beneficial, it introduces the problem of how to write reusable functionality (e.g., general purpose widgets) that can work across all the different types. Traditional information integration can solve the problem of schema heterogeneity by defining a single global schema that captures the shared semantics of the heterogeneous (local) schemas. Functionality and queries can then be written against the global schema and return data from local sources in the form of the global schema, but the meaningful local semantics (such as type and attribute names) are not returned. Mappings are also complex and require skilled developers to create. Here we propose a system that we call \textit{local radiance} (LR) that captures both global shared semantics as well as local, beneficial heterogeneity. We provide a formal definition of our system that includes domain structures---small, global schema fragments that represent shared domain-specific semantics--- and canonical structures---domain-independent global schema fragments used to build generic global widgets. We define mappings between local, domain, and canonical levels. Our query language extends the relational algebra to support queries that radiate local semantics to the domain and canonical levels as well as inserting and updating heterogeneous local data from generic global widgets. We characterize the expressive power of our mapping language and show how it can be used to perform complex data and metadata transformations. Through a user study, we evaluate the ability of non-technical users to perform mapping tasks and find that it is both understandable and usable. We report on the ongoing development (in CMSs and a relational database) of LR systems, demonstrate how widgets can be built using local radiance, and show how LR is being used in a number of online public educational repositories

    Towards Interoperable Research Infrastructures for Environmental and Earth Sciences

    Get PDF
    This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions

    Operationalizing and automating data governance

    Get PDF
    The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data. To overcome this challenge, we develop a framework for operationalizing and automating data governance. For the first, we propose a zoned data lake architecture and a set of data governance processes that allow the systematic ingestion, transformation and integration of data from heterogeneous sources, in order to make them readily available for business users. For the second, we propose a set of metadata artifacts that allow the automatic execution of data governance processes, addressing a wide range of data management challenges. We showcase the usefulness of the proposed approach using a real world use case, stemming from the collaborative project with the World Health Organization for the management and analysis of data about Neglected Tropical Diseases. Overall, this work contributes on facilitating organizations the adoption of data-driven strategies into a cohesive framework operationalizing and automating data governance.This work was partly supported by the DOGO4ML project, funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033. Sergi Nadal is partly supported by the Spanish Ministerio de Ciencia e Innovación, as well as the European Union - NextGenerationEU, under project FJC2020-045809-I/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

    Towards Interoperable Research Infrastructures for Environmental and Earth Sciences

    Get PDF
    This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions

    A Kaleidoscope of Digital American Literature

    Get PDF
    The word kaleidoscope comes from a Greek phrase meaning to view a beautiful form, and this report makes the leap of faith that all scholarship is beautiful (Ayers 2005b). This review is divided into three major sections. Part I offers a sampling of the types of digital resources currently available or under development in support of American literature and identifies the prevailing concerns of specialists in the field as expressed during interviews conducted between July 2004 and May 2005. Part two of the report consolidates the results of these interviews with an exploration of resources currently available to illustrate, on the one hand, a kaleidoscope of differing attitudes and assessments, and, on the other, an underlying design that gives shape to the parts. Part three examines six categories of digital work in progress: (1) quality-controlled subject gateways, (2) author studies, (3) public domain e-book collections and alternative publishing models, (4) proprietary reference resources and full-text primary source collections, (5) collections by design, and (6) teaching applications. This survey is informed by a selective review of the recent literature, focusing especially on contributions from scholars that have appeared in discipline-based journals

    The CENDARI White Book of Archives

    Get PDF
    Over the course of its four year project timeline, the CENDARI project has collected archival descriptions and metadata in various formats from a broad range of cultural heritage institutions. These data were drawn together in a single repository and are being stored there. The repository contains curated data which has been manually established by the CENDARI team as well as data acquired from small, ‘hidden’ archives in spreadsheet format or from big aggregators with advanced data exchange tools in place. While the acquisition and curation of heterogeneous data in a single repository presents a technical challenge in itself, the ingestion of data into the CENDARI repository also opens up the possibility to process and index them through data extraction, entity recognition, semantic enhancement and other transformations. In this way the CENDARI project was able to act as a bridge between cultural heritage institutions and historical researchers, insofar as it drew together holdings from a broad range of institutions and enabled the browsing of this heterogeneous content within a single search space. This paper describes a broad range of ways in which the CENDARI project acquired data from cultural heritage institutions as well as the necessary technical background. In exemplifying diverse data creation or acquisition strategies, multiple formats and technical solutions, assets and drawbacks of a repository, this “White Book” aims at providing guidance and advice as well as best practices for archivists and cultural heritage institutions collaborating or planning to collaborate with infrastructure projects. http://www.cendari.eu/thematic- research-guides/white-book-archives The CENDARI White Book of Archives. Available from: http://hdl.handle.net/2262/7568

    Ontologies and datasets for energy measurement and validation interoperability

    Get PDF
    birov2015aInternational audienceThis document presents a final report of the work carried out as part of work package 3 of theREADY4SmartCities project, whose goal it is to identify the knowledge and data resources that supportinteroperability for energy measurement and validation. The document is divided into two parts

    Ontologies and datasets for energy management system interoperability

    Get PDF
    weise2015aInternational audienceThis document presents a final report of the work carried out as part of work package 2 of the READY4SmartCitiesproject (R4SC), whose goal it is to identify the knowledge and data resources that support interoperability for energymanagement systems. The document is divided into two parts
    corecore