Abstract

This project aimed to address a major bottleneck in conducting research on confidential data - the final stage of "Output Statistical Disclosure Control" (OSDC). This is where staff in a Trusted Research Environment (TRE) conduct manual checks to ensure that things a researcher wishes to take out - such as tables, plots, statistical and/or AI models- do not cause risk to any individual's privacy. To tackle this bottleneck, we proposed to:Produce a consolidated framework with a rigorous statistical basis that provides guidance for TREs to agree consistent, standard processes to assist in Quality Assurance.Design and implement a semi-automated system for checks on common research outputs, with increasing levels of support for other types such as AI.Work with a range of different types of TRE in different sectors and organisations to ensure wide applicability.Work with public and patients to explore what is needed for public trust, e.g., that any automation is acting as "an extra pair of eyes": supporting not supplanting TRE staff.Supported by funding from DARE UK (Data and Analytics Research Environments UK), we met these aims through production of documentation, open-source code repositories, and a 'Consensus' statement embodying principles organisations should uphold when deploying any sort of automated disclosure control.Looking forward, we are now ready for extensive user testing and refinement of the resources produced. Following a series of presentations to national and international audiences, a range of different organisations arein the process of trialling the SACRO toolkits. We are delighted that DARE UK has awarded funding to support a Community of Interest group (CoI). This will address ongoing support and the user-led creation of 'soft' resources (such as user guides, 'help desks', and mentoring schemes) to remove blocks to adoption: both for TREs, and crucially for researchers.There are two other areas where we are now ready to make significant advances: applying SACRO to allow principles-based OSDC for 'conceptual data spaces (e.g. via data pooling or federated analytics) and expanding the scope of risk assessment of AI/Machine Learning models to more complex models and types of data. This work is funded by UK research and Innovation, [Grant Number MC_PC_23006], as part of Phase 1 of the DARE UK (Data and Analytics Research Environments UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK

    Similar works