3 research outputs found

    WorldFAIR (D7.2) Population health resource library and training package

    Get PDF
    This project, WorldFAIR – Global Cooperation on FAIR Data Policy and Practice, is funded by the European Commission's WIDERA coordination and support programme under the Grant Agreement no. 101058393. The project consists of 14 work packages, of which work package 7 (WP07) focusses on Population Health. WP07 is led by London School of Hygiene and Tropical Medicine working under the INSPIRE network. The work builds on the delivery of the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) which includes funding by Wellcome (formerly Wellcome Trust) and IDRC Canada. The objective of WP07 is to develop a suite of methods and standards to provide the framework for the Go-FAIR principles for population health data. These standards form the basis of an AI-Ready description of data suitable for use by population health scientists, and understandable across domain and institutional boundaries. The first deliverable (D7.1) identified the Implementation Guide that could be used for population health data, and how it can be developed. This deliverable (D7.2) provides a step-by-step guide as to how to achieve the standards. The deliverable is aimed at population health scientists in low-resource settings, who know their own data and want to make those data FAIR. Population health uses many different tools to collect and manage data. One set of tools includes the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and an OHDSI data analysis workbench that runs on top of it. The OMOP common data model has been used to harmonise and share data, and previous work has shown the tools needed to make OMOP data FAIR. Beyond the data themselves, the results from the analyses conducted on OMOP data can be used as indicators for the success of development goals, including the United Nations Sustainable Development Goals (SDGs). At each stage the tools, data, models and activities need to be described in a way that can be understood by other scientists and by computer search algorithms. This deliverable provides an introduction to the processes involved in making population health data FAIR in a pipeline that spans data collection through data analysis into an SDMX indicators database, and gives seven tutorials on what is needed at each step in this pipeline. It outlines the need to describe the study and the study context, how to use DDI Codebook and DDI Lifecycle with study data and how to use repositories like GitHub to make the metadata available. The next tutorials describe the extract-transform-load (ETL) process for putting the data into an OMOP CDM and the role of JSON-LD in preparing the data for machine searching in Schema.org in line with DDI-CDI. Together these tutorials give an overview of the steps in the OMOP processes which are a pipeline for the data, and how these steps can be performed and documented. Finally the tutorials show how predictive and causal analysis can be conducted and documented using the OMOP CDM and the OHDSI data analysis workbench and how the results can be integrated into an SDMX data cube, which would align with UN standards for SDG indicators. The deliverable does not provide detailed training for each step, but rather introduces the topic and clarifies the practical knowledge and skills that are needed to make this type of health data more FAIR

    Scaling Identifiers and their Metadata to Gigascale: An Architecture to Tackle the Challenges of Volume and Variety

    No full text
    Persistent identifiers are applied to an ever-increasing variety of research objects, including software, samples, models, people, instruments, grants, and projects, and there is a growing need to apply identifiers at a finer and finer granularity. Unfortunately, the systems developed over two decades ago to manage identifiers and the metadata describing the identified objects no longer scale. Communities working with physical samples have grappled with these three challenges of the increasing volume, variety, and variability of identified objects for many years. To address this dual challenge, the IGSN 2040 project explored how metadata and catalogues for physical samples could be shared at the scale of billions of samples across an ever-growing variety of users and disciplines. In this paper, we focus on how we scale identifiers and their describing metadata to billions of objects and who the actors involved with this system are. Our analysis of these requirements resulted in the definition of a minimum viable product and the design of an architecture that not only addresses the challenges of increasing volume and variety but, more importantly, is easy to implement because it reuses commonly used Web components. Our solution is based on a Web architectural model that utilises Schema.org, JSON-LD, and sitemaps. Applying these commonly used architectural patterns on the internet allows us to not only handle increasing variety but also enable better compliance with the FAIR Guiding Principles
    corecore