6 research outputs found

    Federated Learning for Protecting Medical Data Privacy

    Get PDF
    Deep learning is one of the most advanced machine learning techniques, and its prominence has increased in recent years. Language processing, predictions in medical research and pattern recognition are few of the numerous fields in which it is widely utilized. Numerous modern medical applications benefit greatly from the implementation of machine learning (ML) models and the disruptive innovations in the entire modern health care system. It is extensively used for constructing accurate and robust statistical models from large volumes of medical data collected from a variety of sources in contemporary healthcare systems [1]. Due to privacy concerns that restrict access to medical data, these Deep learning techniques have yet to completely exploit medical data despite their immense potential benefits. Many data proprietors are unable to benefit from large-scale deep learning due to privacy and confidentiality concerns associated with data sharing. However, without access to sufficient data, Deep Learning will not be able to realize its maximum potential when transitioning from the research phase to clinical practice [2]. This project addresses this problem by implementing Federated Learning and Encrypted Computations on text data, such as Multi Party Computation. SyferText, a Python library for privacy-protected Natural Language Processing that leverages PySyft to conduct Federated Learning, is used in this context

    A Systematic Review of Re-Identification Attacks on Health Data

    Get PDF
    Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046-0.478) and 0.34 for attacks on health data (95% CI 0-0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods

    Public Health Research Ethics: Clinical Registries and Informed Consent

    Get PDF
    Epidemiologic studies using data collected through disease surveillance or clinical registries improve public health practice. Principles of human research ethics such as the Belmont report and the Declaration of Helsinki have been developed to prevent harm from medical experiments. Those who prepared these principles may not have imagined that the day would arrive when information technology would be so widely available and endemic chronic diseases would become one of the major interests of public health. Some key questions that have now become growing areas of interest include: How to deal with epidemiologic studies which impose minimal risk but which require access to medical records or personal information; how to balance the public good that will result from large epidemiologic studies and protection of privacy In this master’s paper, I reviewed the historical development of research ethics, informed consent, and protection of privacy related to health information, and how they affect the conduct of epidemiologic studies. I discussed the application of research ethics principles and proposed better ways to solve the ethical dilemma between protection of privacy and pursuing the public good through epidemiologic studies, especially using data from medical records and clinical registries. As a result of this review and in consideration of dilemmas regarding the protection of patient privacy and the need for efficient access to data, I developed a set of eight proposals for the ethical use of existing data in medical records or clinical registries in epidemiological and other public health studies.Master of Public Healt

    A Secure Protocol to Distribute Unlinkable Health Data

    No full text
    Health data that appears anonymous, such as DNA records, can be re-identified to named patients via location visit patterns, or trails. This is a realistic privacy concern which continues to exist because data holders do not collaborate prior to making disclosures. In this paper, we present STRANON, a novel computational protocol that enables data holders to work together to determine records that can be disclosed and satisfy a formal privacy protection model. STRANON incorporates a secure encrypted environment, so no data holder reveals information until the trails of disclosed records are provably unlinkable. We evaluate STRANON on real-world datasets with known susceptibilities and demonstrate data holders can release significant quantities of data with zero trail re-identifiability.</p
    corecore