289 research outputs found

    Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints

    Get PDF
    Publishing data about patients that contain both demographics and diagnosis codes is essential to perform large-scale, low-cost medical studies. However, preserving the privacy and utility of such data is challenging, because it requires: (i) guarding against identity disclosure (re-identification) attacks based on both demographics and diagnosis codes, (ii) ensuring that the anonymized data remain useful in intended analysis tasks, and (iii) minimizing the information loss, incurred by anonymization, to preserve the utility of general analysis tasks that are difficult to determine before data publishing. Existing anonymization approaches are not suitable for being used in this setting, because they cannot satisfy all three requirements. Therefore, in this work, we propose a new approach to deal with this problem. We enforce the requirement (i) by applying (k; k^m)-anonymity, a privacy principle that prevents re-identification from attackers who know the demographics of a patient and up to m of their diagnosis codes, where k and m are tunable parameters. To capture the requirement (ii), we propose the concept of utility constraint for both demographics and diagnosis codes. Utility constraints limit the amount of generalization and are specified by data owners (e.g., the healthcare institution that performs anonymization). We also capture requirement (iii), by employing well-established information loss measures for demographics and for diagnosiscodes. To realize our approach, we develop an algorithm that enforces (k; k^m)-anonymity on a dataset containing both demographics and diagnosis codes, in a way that satisfies the specified utility constraints and with minimal information loss, according to the measures. Our experiments with a large dataset containing more than 200; 000 electronic health recordsshow the effectiveness and efficiency of our algorithm

    Publishing data from electronic health records while preserving privacy: a survey of algorithms

    Get PDF
    The dissemination of Electronic Health Records (EHRs) can be highly beneficial for a range of medical studies, spanning from clinical trials to epidemic control studies, but it must be performed in a way that preserves patients’ privacy. This is not straightforward, because the disseminated data need to be protected against several privacy threats, while remaining useful for subsequent analysis tasks. In this work, we present a survey of algorithms that have been proposed for publishing structured patient data, in a privacy-preserving way. We review more than 45 algorithms, derive insights on their operation, and highlight their advantages and disadvantages. We also provide a discussion of some promising directions for future research in this area

    Anonymization procedures for tabular data: an explanatory technical and legal synthesis

    Get PDF
    In the European Union, Data Controllers and Data Processors, who work with personal data, have to comply with the General Data Protection Regulation and other applicable laws. This affects the storing and processing of personal data. But some data processing in data mining or statistical analyses does not require any personal reference to the data. Thus, personal context can be removed. For these use cases, to comply with applicable laws, any existing personal information has to be removed by applying the so-called anonymization. However, anonymization should maintain data utility. Therefore, the concept of anonymization is a double-edged sword with an intrinsic trade-off: privacy enforcement vs. utility preservation. The former might not be entirely guaranteed when anonymized data are published as Open Data. In theory and practice, there exist diverse approaches to conduct and score anonymization. This explanatory synthesis discusses the technical perspectives on the anonymization of tabular data with a special emphasis on the European Union’s legal base. The studied methods for conducting anonymization, and scoring the anonymization procedure and the resulting anonymity are explained in unifying terminology. The examined methods and scores cover both categorical and numerical data. The examined scores involve data utility, information preservation, and privacy models. In practice-relevant examples, methods and scores are experimentally tested on records from the UCI Machine Learning Repository’s “Census Income (Adult)” dataset

    Brain Tumor Growth Modelling .

    Get PDF
    Prediction methods of Glioblastoma tumors growth constitute a hard task due to the lack of medical data, which is mostly related to the patients’ privacy, the cost of collecting a large medical dataset, and the availability of related notations by experts. In this thesis, we study and propose a Synthetic Medical Image Generator (SMIG) with the purpose of generating synthetic data based on Generative Adversarial Network in order to provide anonymized data. In addition, to predict the Glioblastoma multiform (GBM) tumor growth we developed a Tumor Growth Predictor (TGP) based on End to End Convolution Neural Network architecture that allows training on a public dataset from The Cancer Imaging Archive (TCIA), combined with the generated synthetic data. We also highlighted the impact of implicating a synthetic data generated using SMIG as a data augmentation tool. Despite small data size provided by TCIA dataset, the obtained results demonstrate valuable tumor growth prediction accurac

    Uterine fibroid embolization for symptomatic fibroids: study at a teaching hospital in Kenya

    Get PDF
    Objective: Characterization of magnetic (MRI) features in women undergoing uterine fibroid embolization (UFE) and identification of clinical correlates in an African population. Materials and Methods: Patients with symptomatic fibroids who are selected to undergo UFE at the hospital formed the study population. The baseline MRI features, baseline symptom score, short-term imaging outcome, and mid-term symptom scores were analyzed for interval changes. Assessment of potential associations between short-term imaging features and mid-term symptom scores was also done. Results: UFE resulted in statistically significant reduction (P \u3c 0.001) of dominant fibroid, uterine volumes, and reduction of symptom severity scores, which were 43.7%, 40.1%, and 37.8%, respectively. Also, 59% of respondents had more than 10 fibroids. The predominant location of the dominant fibroid was intramural. No statistically significant association was found between clinical and radiological outcome. Conclusion: The response of uterine fibroids to embolization in the African population is not different from the findings reported in other studies from the west. The presence of multiple and large fibroids in this study is consistent with the case mix described in other studies of African-American populations. Patient counseling should emphasize the independence of volume reduction and symptom improvement. Though volume changes are of relevance for the radiologist in understanding the evolution of the condition and identifying potential technical treatment failures, it should not be the main basis of evaluation of treatment success

    Anonymising Clinical Data for Secondary Use

    Full text link
    Secondary use of data already collected in clinical studies has become more and more popular in recent years, with the commitment of the pharmaceutical industry and many academic institutions in Europe and the US to provide access to their clinical trial data. Whilst this clearly provides societal benefit in helping to progress medical research, this has to be balanced against protection of subjects' privacy. There are two main scenarios for sharing subject data: within Clinical Study Reports and Individual Patient Level Data, and these scenarios have different associated risks and generally require different approaches. In any data sharing scenario, there is a trade-off between data utility and the risk of subject re-identification, and achieving this balance is key. Quantitative metrics can guide the amount of de-identification required and new technologies may also start to provide alternative ways to achieve the risk-utility balance.Comment: 25 page
    corecore