59 research outputs found

    DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data

    Get PDF
    Background Contemporary bioscience sometimes demands vast sample sizes and there is often then no choice but to synthesize data across several studies and to undertake an appropriate pooled analysis. This same need is also faced in health-services and socio-economic research. When a pooled analysis is required, analytic efficiency and flexibility are often best served by combining the individual-level data from all sources and analysing them as a single large data set. But ethico-legal constraints, including the wording of consent forms and privacy legislation, often prohibit or discourage the sharing of individual-level data, particularly across national or other jurisdictional boundaries. This leads to a fundamental conflict in competing public goods: individual-level analysis is desirable from a scientific perspective, but is prevented by ethico-legal considerations that are entirely valid

    Better governance, better access: practising responsible data sharing in the METADAC governance infrastructure.

    Get PDF
    BACKGROUND: Genomic and biosocial research data about individuals is rapidly proliferating, bringing the potential for novel opportunities for data integration and use. The scale, pace and novelty of these applications raise a number of urgent sociotechnical, ethical and legal questions, including optimal methods of data storage, management and access. Although the open science movement advocates unfettered access to research data, many of the UK's longitudinal cohort studies operate systems of managed data access, in which access is governed by legal and ethical agreements between stewards of research datasets and researchers wishing to make use of them. Amongst other things, these agreements aim to respect the reasonable expectations of the research participants who provided data and samples, as expressed in the consent process. Arguably, responsible data management and governance of data and sample use are foundational to the consent process in longitudinal studies and are an important source of trustworthiness in the eyes of those who contribute data to genomic and biosocial research. METHODS: This paper presents an ethnographic case study exploring the foundational principles of a governance infrastructure for Managing Ethico-social, Technical and Administrative issues in Data ACcess (METADAC), which are operationalised through a committee known as the METADAC Access Committee. METADAC governs access to phenotype, genotype and 'omic' data and samples from five UK longitudinal studies. FINDINGS: Using the example of METADAC, we argue that three key structural features are foundational for practising responsible data sharing: independence and transparency; interdisciplinarity; and participant-centric decision-making. We observe that the international research community is proactively working towards optimising the use of research data, integrating/linking these data with routine data generated by health and social care services and other administrative data services to improve the analysis, interpretation and utility of these data. The governance of these new complex data assemblages will require a range of expertise from across a number of domains and disciplines, including that of study participants. Human-mediated decision-making bodies will be central to ensuring achievable, reasoned and responsible decisions about the use of these data; the METADAC model described in this paper provides an example of how this could be realised

    Systematic Evaluation of Pleiotropy Identifies 6 Further Loci Associated With Coronary Artery Disease

    Get PDF
    Background: Genome-wide association studies have so far identified 56 loci associated with risk of coronary artery disease (CAD). Many CAD loci show pleiotropy; that is, they are also associated with other diseases or traits. Objectives: This study sought to systematically test if genetic variants identified for non-CAD diseases/traits also associate with CAD and to undertake a comprehensive analysis of the extent of pleiotropy of all CAD loci. Methods: In discovery analyses involving 42,335 CAD cases and 78,240 control subjects we tested the association of 29,383 common (minor allele frequency >5%) single nucleotide polymorphisms available on the exome array, which included a substantial proportion of known or suspected single nucleotide polymorphisms associated with common diseases or traits as of 2011. Suggestive association signals were replicated in an additional 30,533 cases and 42,530 control subjects. To evaluate pleiotropy, we tested CAD loci for association with cardiovascular risk factors (lipid traits, blood pressure phenotypes, body mass index, diabetes, and smoking behavior), as well as with other diseases/traits through interrogation of currently available genome-wide association study catalogs. Results: We identified 6 new loci associated with CAD at genome-wide significance: on 2q37 (KCNJ13-GIGYF2), 6p21 (C2), 11p15 (MRVI1-CTR9), 12q13 (LRP1), 12q24 (SCARB1), and 16q13 (CETP). Risk allele frequencies ranged from 0.15 to 0.86, and odds ratio per copy of the risk allele ranged from 1.04 to 1.09. Of 62 new and known CAD loci, 24 (38.7%) showed statistical association with a traditional cardiovascular risk factor, with some showing multiple associations, and 29 (47%) showed associations at p < 1 × 10−4 with a range of other diseases/traits. Conclusions: We identified 6 loci associated with CAD at genome-wide significance. Several CAD loci show substantial pleiotropy, which may help us understand the mechanisms by which these loci affect CAD risk

    Discovery of rare variants associated with blood pressure regulation through meta-analysis of 1.3 million individuals

    Get PDF
    Correction: Volume53, Issue5 Page 762-762 DOI: 10.1038/s41588-021-00832-z Published MAY 2021Genetic studies of blood pressure (BP) to date have mainly analyzed common variants (minor allele frequency > 0.05). In a meta-analysis of up to similar to 1.3 million participants, we discovered 106 new BP-associated genomic regions and 87 rare (minor allele frequencyPeer reviewe

    Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension

    Get PDF
    High blood pressure is a major risk factor for cardiovascular disease and premature death. However, there is limited knowledge on specific causal genes and pathways. To better understand the genetics of blood pressure, we genotyped 242,296 rare, low-frequency and common genetic variants in up to ~192,000 individuals, and used ~155,063 samples for independent replication. We identified 31 novel blood pressure or hypertension associated genetic regions in the general population, including three rare missense variants in RBM47, COL21A1 and RRAS with larger effects (>1.5mmHg/allele) than common variants. Multiple rare, nonsense and missense variant associations were found in A2ML1 and a low-frequency nonsense variant in ENPEP was identified. Our data extend the spectrum of allelic variation underlying blood pressure traits and hypertension, provide new insights into the pathophysiology of hypertension and indicate new targets for clinical intervention

    Rare and low-frequency coding variants alter human adult height

    Get PDF
    Height is a highly heritable, classic polygenic trait with ~700 common associated variants identified so far through genome - wide association studies . Here , we report 83 height - associated coding variants with lower minor allele frequenc ies ( range of 0.1 - 4.8% ) and effects of up to 2 16 cm /allele ( e.g. in IHH , STC2 , AR and CRISPLD2 ) , >10 times the average effect of common variants . In functional follow - up studies, rare height - increasing alleles of STC2 (+1 - 2 cm/allele) compromise d proteolytic inhibition of PAPP - A and increased cleavage of IGFBP - 4 in vitro , resulting in higher bioavailability of insulin - like growth factors . The se 83 height - associated variants overlap genes mutated in monogenic growth disorders and highlight new biological candidates ( e.g. ADAMTS3, IL11RA, NOX4 ) and pathways ( e.g . proteoglycan/ glycosaminoglycan synthesis ) involved in growth . Our results demonstrate that sufficiently large sample sizes can uncover rare and low - frequency variants of moderate to large effect associated with polygenic human phenotypes , and that these variants implicate relevant genes and pathways

    Publisher Correction: Discovery of rare variants associated with blood pressure regulation through meta-analysis of 1.3 million individuals

    Get PDF

    DataSHIELD – shared individual-level analysis without sharing the data: a biostatistical perspective

    No full text
    Very large sample sizes are required for estimating effects which are known to be small, and for addressing intricate or complex statistical questions. This is often only achievable by pooling data from multiple studies, especially in genetic epidemiology where associations between individual genetic variants and phenotypes of interest are generally weak. However, the physical pooling of experimental data across a consortium is frequently prohibited by the ethico-legal constraints that govern agreements and consents for individual studies. Study level meta-analyses are frequently used so that data from multiple studies need not be pooled to conduct an analysis, though the resulting analysis is necessarily restricted by the available summary statistics. The idea of maintaining data security is also of importance in other areas and approaches to carrying out ‘secure analyses’ that do not require sharing of data from different sources have been proposed in the technometrics literature. Crucially, the algorithms for fitting certain statistical models can be manipulated so that an individual level meta-analysis can essentially be performed without the need for pooling individual-level data by combining particular summary statistics obtained individually from each study. DataSHIELD (Data Aggregation Through Anonymous Summary-statistics from Harmonised Individual levEL Databases) is a tool to coordinate analyses of data that cannot be pooled. In this paper, we focus on explaining why a DataSHIELD approach yields identical results to an individual level meta-analysis in the case of a generalised linear model, by simply using summary statistics from each study. It is also an efficient approach to carrying out a study level meta-analysis when this is appropriate and when the analysis can be pre-planned. We briefly comment on the IT requirements, together with the ethical and legal challenges which must be addressed

    DataSHIELD - shared individual-level analysis without sharing the data:A biostatistical perspective

    No full text
    Very large sample sizes are required for estimating effects which are known to be small, and for addressing intricate or complex statistical questions. This is often only achievable by pooling data from multiple studies, especially in genetic epidemiology where associations between individual genetic variants and phenotypes of interest are generally weak. However, the physical pooling of experimental data across a consortium is frequently prohibited by the ethico-legal constraints that govern agreements and consents for individual studies. Study level meta-analyses are frequently used so that data from multiple studies need not be pooled to conduct an analysis, though the resulting analysis is necessarily restricted by the available summary statistics. The idea of maintaining data security is also of importance in other areas and approaches to carrying out ‘secure analyses’ that do not require sharing of data from different sources have been proposed in the technometrics literature. Crucially, the algorithms for fitting certain statistical models can be manipulated so that an individual level meta-analysis can essentially be performed without the need for pooling individual-level data by combining particular summary statistics obtained individually from each study. DataSHIELD (Data Aggregation Through Anonymous Summary-statistics from Harmonised Individual levEL Databases) is a tool to coordinate analyses of data that cannot be pooled. In this paper, we focus on explaining why a DataSHIELD approach yields identical results to an individual level meta-analysis in the case of a generalised linear model, by simply using summary statistics from each study. It is also an efficient approach to carrying out a study level meta-analysis when this is appropriate and when the analysis can be pre-planned. We briefly comment on the IT requirements, together with the ethical and legal challenges which must be addressed
    corecore