15 research outputs found

    Survey: Leakage and Privacy at Inference Time

    Get PDF
    Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research

    Final Report of the Independent Expert Group for the Unlocking the Value of Data Programme

    Get PDF
    This report is the final output of the Independent Expert Group on the Unlocking the Value of Data programme, to the Scottish Government. This report is a Ministerial commission, and was originally commissioned by the former Minister for Business, Trade, Tourism and Enterprise

    Use of “Hidden in Plain Sight” de-identification methodology in electronic healthcare data provides minimal risk of misidentification: Results from the iCAIRD Safe Haven Artificial Intelligence Platform.

    Get PDF
    Objectives To determine the risk of misidentification when using a “Hidden In Plain Sight (HIPS)” Named Entity Recognition (NER) de-identification methodology applied to Scottish healthcare data within The Industrial Centre for Artificial Intelligence Research in Digital Diagnostics (iCAIRD) Safe Haven Artificial Intelligence Platform (SHAIP). Approach Rather than the traditional redaction of potential identifiable information in routinely collected healthcare data, our HIPS methodology utilises an NER “find and replace” approach to de-identification that keeps the structure of text intact. This ensures that context is maintained, key to the interpretation of free text information and potential Artificial Intelligence applications. To our knowledge these methods have been previously untested on Scottish healthcare data. We therefore performed assessment of this approach in terms of potential risk of misidentification using HIPS on structured Scottish data deployed in SHAIP as part of the iCAIRD programme. Results Five individual cohorts, with a total of 169,964 patients were included. For each cohort the HIPS approach was applied, and then compared to actual patient information from within the same region, in order to determine the risk of misidentification. The following fields were included: Forename, Surname, Previous Name, Gender, Date of Birth (DOB), and Postcode. Across the five cohorts and varying combinations of identifiable data fields there were a total of 94 instances of potential misidentification (0.06%). 85/94 (90.4%) of these were for the combination of Gender, Date of Birth and Postcode. Across the five cohorts there were only 3 instances (0.002%) of Forename/Surname/DOB, and 5 instances (0.003%) of Forename/Surname/Postcode potential misidentification amongst the 169,964 patients. Conclusions The iCAIRD NER HIPS Methodology provides an acceptably low misidentification rate. Further work is now required to determine the recall and precision rates. Benefits of this approach include retaining the structure of free text, as well as reducing the ability to detect any potential leaked identifiable data

    Barriers and facilitators of cross-sectoral data linkage to inform healthy public policy and practice: lessons from three case study projects in Scotland.

    Get PDF
    Objectives We sought to describe barriers and facilitators faced by three research projects aiming to link routinely-collected data across various sectors, to produce evidence to inform healthy public policy. We conducted these case studies as a part of a wider research project on cross-sectoral sharing and linkage of secondary data. Approach We selected the case studies to cover a range of target populations and datasets. The chosen projects investigated (1) the health of care-experienced children; (2) the intersection of homelessness, justice involvement, drug use, and severe mental illness; (3) multi-morbidity among adults receiving social care. Information about timelines and governance processes was collected from lead investigators, including specific barriers and facilitators encountered, using a standardised pro forma and follow-up interviews. Thematic analysis was carried out by the research team, informed by themes identified in a parallel scoping review of existing literature on evidence use for healthy public policy and practice. Results Each project involved between 6 and 11 agencies, with co-ordination across multiple institutions and geographies proving challenging. Due to challenges encountered, all projects had to amend their original geographical or demographic scope. Forty-four barriers and facilitators to sharing and linkage of cross-sectoral routinely-collected data for public health research were identified. These included but were not limited to: integration of current data in an ever-changing linkage landscape; the need for timely feedback in undertaking the study; standardisation of information governance processes; highlighting the resourcing and funding issues for data linkage projects; the need for data controllers to recognise the value of such projects; and issues relating to staff turnover and workload pressures. Conclusion The interconnected nature of barriers and facilitators identified by the case studies suggests the importance of a whole-systems approach to cross-sectoral linkage. While literature offers relatively few case studies of cross-sectoral linkage for health research, the value of their insight into the linkage landscape derived from real-life experience is substantial

    A National Network of Safe Havens:A Scottish Perspective

    Get PDF
    For over a decade, Scotland has implemented and operationalized a system of Safe Havens, which provides secure analytics platforms for researchers to access linked, deidentified electronic health records (EHRs) while managing the risk of unauthorized reidentification. In this paper, a perspective is provided on the state-of-the-art Scottish Safe Haven network, including its evolution, to define the key activities required to scale the Scottish Safe Haven network’s capability to facilitate research and health care improvement initiatives. A set of processes related to EHR data and their delivery in Scotland have been discussed. An interview with each Safe Haven was conducted to understand their services in detail, as well as their commonalities. The results show how Safe Havens in Scotland have protected privacy while facilitating the reuse of the EHR data. This study provides a common definition of a Safe Haven and promotes a consistent understanding among the Scottish Safe Haven network and the clinical and academic research community. We conclude by identifying areas where efficiencies across the network can be made to meet the needs of population-level studies at scale

    Masses, radii, and orbits of small Kepler planets : The transition from gaseous to rocky planets

    Get PDF
    We report on the masses, sizes, and orbits of the planets orbiting 22 Kepler stars. There are 49 planet candidates around these stars, including 42 detected through transits and 7 revealed by precise Doppler measurements of the host stars. Based on an analysis of the Kepler brightness measurements, along with high-resolution imaging and spectroscopy, Doppler spectroscopy, and (for 11 stars) asteroseismology, we establish low false-positive probabilities (FPPs) for all of the transiting planets (41 of 42 have an FPP under 1%), and we constrain their sizes and masses. Most of the transiting planets are smaller than three times the size of Earth. For 16 planets, the Doppler signal was securely detected, providing a direct measurement of the planet's mass. For the other 26 planets we provide either marginal mass measurements or upper limits to their masses and densities; in many cases we can rule out a rocky composition. We identify six planets with densities above 5 g cm-3, suggesting a mostly rocky interior for them. Indeed, the only planets that are compatible with a purely rocky composition are smaller than 2 R ⊕. Larger planets evidently contain a larger fraction of low-density material (H, He, and H2O).Peer reviewedFinal Accepted Versio

    SteatoSITE: an Integrated Gene-to-Outcome Data Commons for Precision Medicine Research in NAFLD

    Get PDF
    Nonalcoholic fatty liver disease (NAFLD) is the commonest cause of chronic liver disease worldwide and a growing healthcare burden. The pathobiology of NAFLD is complex, disease progression is variable and unpredictable, and there are no qualified prognostic biomarkers or licensed pharmacotherapies that can improve clinical outcomes; it represents an unmet precision medicine challenge. We established a retrospective multicentre national cohort of 940 patients, across the complete NAFLD spectrum, integrating quantitative digital pathology, hepatic RNA-sequencing and 5.67 million days of longitudinal electronic health record follow-up into a secure, searchable, open resource (SteatoSITE) to inform rational biomarker and drug development and facilitate personalised medicine approaches for NAFLD. A complementary web-based gene browser was also developed. Here, our initial analysis uncovers disease stage-specific gene expression signatures, pathogenic hepatic cell subpopulations and master regulator networks associated with disease progression in NAFLD. Additionally, we construct novel transcriptional risk prediction tools for the development of future hepatic decompensation events

    An integrated gene-to-outcome multimodal database for metabolic dysfunction-associated steatotic liver disease

    Get PDF
    Metabolic dysfunction-associated steatotic liver disease (MASLD) is the commonest cause of chronic liver disease worldwide and represents an unmet precision medicine challenge. We established a retrospective national cohort of 940 histologically defined patients (55.4% men, 44.6% women; median body mass index 31.3; 32% with type 2 diabetes) covering the complete MASLD severity spectrum, and created a secure, searchable, open resource (SteatoSITE). In 668 cases and 39 controls, we generated hepatic bulk RNA sequencing data and performed differential gene expression and pathway analysis, including exploration of gender-specific differences. A web-based gene browser was also developed. We integrated histopathological assessments, transcriptomic data and 5.67 million days of time-stamped longitudinal electronic health record data to define disease-stage-specific gene expression signatures, pathogenic hepatic cell subpopulations and master regulator networks associated with adverse outcomes in MASLD. We constructed a 15-gene transcriptional risk score to predict future hepatic decompensation events (area under the receiver operating characteristic curve 0.86, 0.81 and 0.83 for 1-, 3- and 5-year risk, respectively). Additionally, thyroid hormone receptor beta regulon activity was identified as a critical suppressor of disease progression. SteatoSITE supports rational biomarker and drug development and facilitates precision medicine approaches for patients with MASLD
    corecore