5 research outputs found
A Semantic Web Management Model for Integrative Biomedical Informatics
Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MD Anderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis
Recommended from our members
GWAS and meta-analysis identifies 49 genetic variants underlying critical COVID-19
Data availability: Downloadable summary data are available through the GenOMICC data site (https://genomicc.org/data). Summary statistics are available, but without the 23andMe summary statistics, except for the 10,000 most significant hits, for which full summary statistics are available. The full GWAS summary statistics for the 23andMe discovery dataset will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. For further information and to apply for access to the data, see the 23andMe website (https://research.23andMe.com/dataset-access/). All individual-level genotype and whole-genome sequencing data (for both academic and commercial uses) can be accessed through the UKRI/HDR UK Outbreak Data Analysis Platform (https://odap.ac.uk). A restricted dataset for a subset of GenOMICC participants is also available through the Genomics England data service. Monocyte RNA-seq data are available under the title ‘Monocyte gene expression data’ within the Oxford University Research Archives (https://doi.org/10.5287/ora-ko7q2nq66). Sequencing data will be made freely available to organizations and researchers to conduct research in accordance with the UK Policy Framework for Health and Social Care Research through a data access agreement. Sequencing data have been deposited at the European Genome–Phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001007111.Extended data figures and tables are available online at https://www.nature.com/articles/s41586-023-06034-3#Sec21 .Supplementary information is available online at https://www.nature.com/articles/s41586-023-06034-3#Sec22 .Code availability:
Code to calculate the imputation of P values on the basis of SNPs in linkage disequilibrium is available at GitHub (https://github.com/baillielab/GenOMICC_GWAS).Acknowledgements: We thank the members of the Banco Nacional de ADN and the GRA@CE cohort group; and the research participants and employees of 23andMe for making this work possible. A full list of contributors who have provided data that were collated in the HGI project, including previous iterations, is available online (https://www.covid19hg.org/acknowledgements).Change history: 11 July 2023: A Correction to this paper has been published at: https://doi.org/10.1038/s41586-023-06383-z. -- In the version of this article initially published, the name of Ana Margarita Baldión-Elorza, of the SCOURGE Consortium, appeared incorrectly (as Ana María Baldion) and has now been amended in the HTML and PDF versions of the article.Copyright © The Author(s) 2023, Critical illness in COVID-19 is an extreme and clinically homogeneous disease phenotype that we have previously shown1 to be highly efficient for discovery of genetic associations2. Despite the advanced stage of illness at presentation, we have shown that host genetics in patients who are critically ill with COVID-19 can identify immunomodulatory therapies with strong beneficial effects in this group3. Here we analyse 24,202 cases of COVID-19 with critical illness comprising a combination of microarray genotype and whole-genome sequencing data from cases of critical illness in the international GenOMICC (11,440 cases) study, combined with other studies recruiting hospitalized patients with a strong focus on severe and critical disease: ISARIC4C (676 cases) and the SCOURGE consortium (5,934 cases). To put these results in the context of existing work, we conduct a meta-analysis of the new GenOMICC genome-wide association study (GWAS) results with previously published data. We find 49 genome-wide significant associations, of which 16 have not been reported previously. To investigate the therapeutic implications of these findings, we infer the structural consequences of protein-coding variants, and combine our GWAS results with gene expression data using a monocyte transcriptome-wide association study (TWAS) model, as well as gene and protein expression using Mendelian randomization. We identify potentially druggable targets in multiple systems, including inflammatory signalling (JAK1), monocyte–macrophage activation and endothelial permeability (PDE4A), immunometabolism (SLC2A5 and AK5), and host factors required for viral entry and replication (TMPRSS2 and RAB2A).GenOMICC was funded by Sepsis Research (the Fiona Elizabeth Agnew Trust), the Intensive Care Society, a Wellcome Trust Senior Research Fellowship (to J.K.B., 223164/Z/21/Z), the Department of Health and Social Care (DHSC), Illumina, LifeArc, the Medical Research Council, UKRI, a BBSRC Institute Program Support Grant to the Roslin Institute (BBS/E/D/20002172, BBS/E/D/10002070 and BBS/E/D/30002275) and UKRI grants MC_PC_20004, MC_PC_19025, MC_PC_1905 and MRNO2995X/1. A.D.B. acknowledges funding from the Wellcome PhD training fellowship for clinicians (204979/Z/16/Z), the Edinburgh Clinical Academic Track (ECAT) programme. This research is supported in part by the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant MC_PC_20029). Laboratory work was funded by a Wellcome Intermediate Clinical Fellowship to B.F. (201488/Z/16/Z). We acknowledge the staff at NHS Digital, Public Health England and the Intensive Care National Audit and Research Centre who provided clinical data on the participants; and the National Institute for Healthcare Research Clinical Research Network (NIHR CRN) and the Chief Scientist’s Office (Scotland), who facilitate recruitment into research studies in NHS hospitals, and to the global ISARIC and InFACT consortia. GenOMICC genotype controls were obtained using UK Biobank Resource under project 788 funded by Roslin Institute Strategic Programme Grants from the BBSRC (BBS/E/D/10002070 and BBS/E/D/30002275) and Health Data Research UK (HDR-9004 and HDR-9003). UK Biobank data were used in the GSMR analyses presented here under project 66982. The UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. It has also had funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK. The work of L.K. was supported by an RCUK Innovation Fellowship from the National Productivity Investment Fund (MR/R026408/1). J.Y. is supported by the Westlake Education Foundation. SCOURGE is funded by the Instituto de Salud Carlos III (COV20_00622 to A.C., PI20/00876 to C.F.), European Union (ERDF) ‘A way of making Europe’, Fundación Amancio Ortega, Banco de Santander (to A.C.), Cabildo Insular de Tenerife (CGIEU0000219140 ‘Apuestas científicas del ITER para colaborar en la lucha contra la COVID-19’ to C.F.) and Fundación Canaria Instituto de Investigación Sanitaria de Canarias (PIFIISC20/57 to C.F.). We also acknowledge the contribution of the Centro National de Genotipado (CEGEN) and Centro de Supercomputación de Galicia (CESGA) for funding this project by providing supercomputing infrastructures. A.D.L. is a recipient of fellowships from the National Council for Scientific and Technological Development (CNPq)-Brazil (309173/2019-1 and 201527/2020-0)