34 research outputs found

    Generation Scotland: Linking all the records we can

    Get PDF
    Objectives We started a family-based genetic epidemiology study in 2006-11 which recruited ~24,000 adult volunteers from ~7000 families across Scotland with consent for follow-up through medical record linkage and re-contact. In 2022-23 we are recruiting another 20,000, with consent extended to administrative records, with age range now 12+. Methods Original volunteers completed a demographic, health and lifestyle questionnaire, provided biological samples, and underwent detailed clinical assessment. The samples, phenotype and genotype data form a resource for research on the genetics of conditions of public health importance. This has become a longitudinal dataset by linkage to routine NHS hospital, maternity, lab test, prescriptions, dentistry, mortality, imaging, cancer screening, GP data records, Covid-19 testing and vaccinations, as well as follow-up questionnaires. The new wave of recruitment is all online and can be done on a smartphone, with DNA from saliva collected by post. Teenagers aged 12-15 can join with parental consent. Results GWAS has been done on quantitative traits and biomarkers, with DNA methylation data and proteomics available for most of the cohort. Our “CovidLife” surveys collected data on effects of the pandemic. Researchers can find prevalent and incident disease cases and controls, to test research hypotheses on a stratified population. They can also do targeted recruitment of participants to new studies, including recall by genotype. We have established and validated E-HR linkage with the NHS Scotland CHI Register,,overcoming technical and governance issues in the process. We contribute to major international consortia, with collaborators from institutions worldwide, both academic and commercial. Recruits are asked to give consent to linkage to other administrative data, and reuse of samples from routine NHS tests for medical research. Conclusion We plan to extend the linkage process to include other administrative data from national datasets as and when approvals are obtained. New types of data can also be collected by online questionnaires. The Research Tissue Bank resources are available to academic and commercial researchers through a managed access process

    The UK Longitudinal Linkage Collaboration: A trusted research environment for the longitudinal research community

    Get PDF
    Objectives Our Trusted Research Environment (TRE) provides a centralised infrastructure to pool Longitudinal Population Studies’ (LPS) data and systematically link participants’ routine health, administrative and environmental records. All data are held in a centralised research resource which is now certified by UK Statistics Authority as meeting the Digital Economy Act standard. Approach We have created an unprecedented infrastructure integrating data from interdisciplinary and pan-UK LPS linked to participants’ NHS England records with delegated access responsibilities. Integrated and curated data are made available for pooled analysis within a functionally anonymous DEA and ISO 27001 accredited TRE. We developed a bespoke governance and data curation framework with LPS data managers and Public/participant contributors. New data pipelines are being built with partners at ADRUK and the Office of National Statistics to link non-health records. Our design supports long-term sustainability, linkage accuracy and the ability to link data at both an individual and household level. Results This organisation is a collaboration of >24 LPS with ~280,000 participants. Participants' data are linked to NHS records and geo-coded environmental exposures. This resource is now accessible for public benefit research for bona fide UK researchers. Administrative data including tax, work and pensions, and education are being added to the resource. This data flow is enabled by: (1) a model where TTP processes participant identifiers for many different data owners; (2) creation of a novel longitudinal data pipeline, enabling linkage, data extraction and update of records over time; (3) an access framework where Linked Data Access Panel considers applications on behalf of data owners (e.g., the NHS), with review by a Public Panel and distributing applications to LPS for approval of appropriate data use. Conclusion Our organisation provides a strategic research-ready platform for longitudinal research. We are extending linkages of LPS participants to previously inaccessible datasets. The research resource is positioned to allow researchers to investigate cross-cutting themes such as understanding health and social inequalities, health-social-environmental interactions, and managing the COVID-19 recovery

    UK Longitudinal Linkage Collaboration – and the challenges in creating a new Longitudinal Populations Studies linked data resource.

    Get PDF
    Objectives The UK Longitudinal Linkage Collaboration (UK LLC) is a new, unprecedented infrastructure enabling research into the COVID-19 pandemic. The UK LLC integrates data from >20 UK longitudinal studies with systematically linked health, administrative and environmental records to facilitate cross-disciplinary COVID-19 research for accredited UK based researchers. Approach Bringing together all of the key components that form the UK LLC was a huge challenge that may have only been possible in the midst of the pandemic. First, we collaborated with the Longitudinal Population Studies (LPS) to create and agree how data linkage, data provision and applications to access the UK LLC would work. In parallel, public contributors helped to create fair processing materials. Finally, we worked closely with NHS Digital and other key national data providers to organise approvals for all studies to be linked, and for the UK LLC to have delegated decision-making for research applications. Results We faced a myriad of challenges creating the UK LLC including: • Short timeframe and short-term funding structure – initial funding for six months with an 18-month extension. • Working across >20 different LPS and four nations with different structures for access, consent and data provision. • Lack of capacity at various points in the data pipeline due to the volume of COVID-19 research required and underway across the involved organisations. • Data processing complexities – split data method means no one can see the entire process therefore catching linkage errors requires working across four different organisations. • With such complex data flows it is challenging to find the balance with communications about data to the public – being accurate about what we are doing, but expressing the complexity in lay terms. Conclusion Creating the UK LLC required collaboration with LPS, data providers and researchers. An iterative approach to creating the data application and data provision pipelines was crucial in developing these processes. The UK LLC was built quickly, from initial funding in October 2020 to provisioning data to researchers in December 2021

    Methodological developments in administrative data linkage for cross cutting policy relevant research: Working towards a sustainable data pipeline

    Get PDF
    Objectives Develop administrative linkages within a national Trusted Research Environment (TRE) that hosts Longitudinal Population Study (LPS) data for over 20 LPS. We will describe the methodological development carried out to enable linkage to administrative datasets. These linked administrative data will support research for public good, informing policy and practice. Methods The first sets of administrative data under consideration in this Feasibility Study are from the Department of Work and Pensions (DWP), the Department for Education (DfE) and HM Revenue and Customs (HMRC). Working with UK Government departments through a Task & Finish group we have gathered input from DWP, HMRC and DfE and Office for National Statistics (ONS) data sharing experts. The Task & Finish group identified three pragmatic data linkage and data sharing models, that would enable data to be linked via a newly designed secure data pipeline in a legal, secure, and trustworthy manner for all stakeholders. Results To encourage sustainability and acceptability, a model designed to be maintained over a long period is based on the re-use of Departmental Personal Identifiable Information (PII) – i.e., name, date of birth, gender, National Insurance number - and attribute data already deposited by the Departments into ONS. ONS will develop for the linkage and extraction of ONS Data into the TRE a system which conducts, and quality assesses the linkage; minimises the Departmental data to participants within the TRE only and the variables specified in the agreements; and, de-identifies the data to their DEA processing standards. The minimised and functionally anonymous data extract will be securely transferred for ingest and integration into the TRE enabling researchers to address a wider range of questions for public benefit. Conclusion This is a model for efficient and low-burden linkages to inform cross cutting research. It will form part of a responsive UK data science capability which can inform government research needs and be used to meet future crisis e.g. new pandemics, the impacts of climate change or economic shocks

    Identifying dementia outcomes in UK Biobank: a validation study of primary care, hospital admissions and mortality data.

    Get PDF
    Prospective, population-based studies that recruit participants in mid-life are valuable resources for dementia research. Follow-up in these studies is often through linkage to routinely-collected healthcare datasets. We investigated the accuracy of these datasets for dementia case ascertainment in a validation study using data from UK Biobank-an open access, population-based study of > 500,000 adults aged 40-69 years at recruitment in 2006-2010. From 17,198 UK Biobank participants recruited in Edinburgh, we identified those with ≥ 1 dementia code in their linked primary care, hospital admissions or mortality data and compared their coded diagnoses to clinical expert adjudication of their full-text medical record. We calculated the positive predictive value (PPV, the proportion of cases identified that were true positives) for all-cause dementia, Alzheimer's disease and vascular dementia for each dataset alone and in combination, and explored algorithmic code combinations to improve PPV. Among 120 participants, PPVs for all-cause dementia were 86.8%, 87.3% and 80.0% for primary care, hospital admissions and mortality data respectively and 82.5% across all datasets. We identified three algorithms that balanced a high PPV with reasonable case ascertainment. For Alzheimer's disease, PPVs were 74.1% for primary care, 68.2% for hospital admissions, 50.0% for mortality data and 71.4% in combination. PPV for vascular dementia was 43.8% across all sources. UK routinely-collected healthcare data can be used to identify all-cause dementia in prospective studies. PPVs for Alzheimer's disease and vascular dementia are lower. Further research is required to explore the geographic generalisability of these findings

    Research feasibility and ethics in Scottish new-born blood spot archive.

    Get PDF
    Objectives There were two objectives to this study: 1) to gauge public opinion on the use of Guthrie card-derived blood samples for epidemiological and biological research; and 2) to evaluate the feasibility of recovering meaningful molecular data from these samples. Approach To address the first objective, a 2-day Citizens’ Jury was conducted in partnership with Ipsos MORI, comprising a diverse adult sample in terms of age, sex, working status and social grade (n=20). Jurors were asked whether research access to Guthrie card blood tests would be in the public interest. To address the second objective, DNA methylation (DNAm) was profiled from samples from 58 Generation Scotland participants, whose Guthrie cards had been stored from birth for between 32 and 38 years. Analyses were performed on Guthrie DNAm samples to determine whether previously-reported associations with perinatal maternal smoking behaviours were detectable. Results The Citizens’ Jury yielded an overall positive response towards data sharing for health research. Concerns were raised about data protection and security, control and oversight, and commercial use. The overall verdict was that access to Guthrie card data would be in the public interest, conditional on the purpose of the research, regulated access procedures, ethical oversight and provision of opportunities for participants to opt out. DNAm detection rates from Guthrie samples were lower than from samples stored in tubes. However, it was possible to confirm linkage to the correct individuals in Generation Scotland using DNAm-derived estimates of genotype and sex. A significant association was observed between a DNAm-based score for smoking and perinatal maternal smoking status derived from the baseline Generation Scotland questionnaire. Conclusion We showed that: 1) public support exists for using Guthrie samples in research, conditional on certain safeguards; 2) DNAm can be profiled from cards stored for up to 38 years and can predict maternal smoking behaviour. Guthrie cards are a potentially valuable resource for epidemiological studies and predicting health outcomes

    Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank

    Get PDF
    Objective In UK Biobank (UKB), a large population-based prospective study, cases of many diseases are ascertained through linkage to routinely collected, coded national health datasets. We assessed the accuracy of these for identifying incident strokes. Methods In a regional UKB subpopulation (n = 17,249), we identified all participants with ≥1 code signifying a first stroke after recruitment (incident stroke-coded cases) in linked hospital admission, primary care, or death record data. Stroke physicians reviewed their full electronic patient records (EPRs) and generated reference standard diagnoses. We evaluated the number and proportion of cases that were true-positives (i.e., positive predictive value [PPV]) for all codes combined and by code source and type. Results Of 232 incident stroke-coded cases, 97% had EPR information available. Data sources were 30% hospital admission only, 39% primary care only, 28% hospital and primary care, and 3% death records only. While 42% of cases were coded as unspecified stroke type, review of EPRs enabled a pathologic type to be assigned in >99%. PPVs (95% confidence intervals) were 79% (73%–84%) for any stroke (89% for hospital admission codes, 80% for primary care codes) and 83% (74%–90%) for ischemic stroke. PPVs for small numbers of death record and hemorrhagic stroke codes were low but imprecise. Conclusions Stroke and ischemic stroke cases in UKB can be ascertained through linked health datasets with sufficient accuracy for many research studies. Further work is needed to understand the accuracy of death record and hemorrhagic stroke codes and to develop scalable approaches for better identifying stroke types

    A public panel reviews applications and questions applicants: Team member and public contributor discuss a transparent and inclusive approach to data access reviews

    Get PDF
    Objectives We created a panel with members of the public and longitudinal study participants who review our data access requests. This panel forms an integral part of our data access application process, giving the public a say who can access the data for research. Methods We advertised our lay member vacancies using social media, newsletters, word of mouth and the internet. We appointed six people to the public panel. Our panel includes study participants, NHS service users, parents, carers, and people with experience of disability, neurodiversity, and long-term health conditions. The Panel Terms of Reference were created with help from stakeholders and study teams involved in longitudinal studies that involve the public in data access applications. This ensured that the purpose of the panel was clear. The panel reviews lay summaries and makes sure that researchers have adequate public involvement in their project. Results Panel members have reviewed 28 applications. Researchers present their research at an online meeting with the panel then answer questions from the panel members. We publish meeting minutes on our website for transparency. A 6-month review was overwhelmingly positive - all panel members indicated they felt valued. They felt able to challenge and question researchers as part of the data access application process. This provides a level of public scrutiny to our work. “I feel there’s a real value in the panel. You get a real sense that this has got such potential to make a contribution.” (panel member) We are further developing the Panel Terms of Reference with panel members. We will consider additional areas of responsibility, for example, public benefit review. Conclusion We regularly review how to improve public involvement in our work. The panel has proven its value during our application process. Therefore we are exploring with the panel a new approach to assess the public benefit of applications and what is meant by ‘public benefit research’
    corecore