623 research outputs found

    Building up the “Accountable Ulysses” model. The impact of GDPR and national implementations, ethics, and health-data research: Comparative remarks.

    Get PDF
    The paper illustrates obligations emerging under articles 9 and 89 of the EU Reg. 2016/679 (General Data Protection Regulation, hereinafter “GDPR”) within the health-related data pro- cessing for research purposes. Furthermore, through a comparative analysis of the national implementations of the GDPR on the topic, the paper highlights few practical issues that the researcher might deal with while accomplishing the GDPR obligations and the other ethical requirements. The result of the analyses allows to build up a model to achieve an acceptable standard of accountability in health-related data research. The legal remarks are framed within the myth of Ulysse

    Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

    Full text link
    This work investigates the effectiveness of different pseudonymization techniques, ranging from rule-based substitutions to using pre-trained Large Language Models (LLMs), on a variety of datasets and models used for two widely used NLP tasks: text classification and summarization. Our work provides crucial insights into the gaps between original and anonymized data (focusing on the pseudonymization technique) and model quality and fosters future research into higher-quality anonymization techniques to better balance the trade-offs between data protection and utility preservation. We make our code, pseudonymized datasets, and downstream models publicly availableComment: 10 pages. Accepted for TrustNLP workshop at ACL202

    Balancing Privacy and Progress in Artificial Intelligence: Anonymization in Histopathology for Biomedical Research and Education

    Full text link
    The advancement of biomedical research heavily relies on access to large amounts of medical data. In the case of histopathology, Whole Slide Images (WSI) and clinicopathological information are valuable for developing Artificial Intelligence (AI) algorithms for Digital Pathology (DP). Transferring medical data "as open as possible" enhances the usability of the data for secondary purposes but poses a risk to patient privacy. At the same time, existing regulations push towards keeping medical data "as closed as necessary" to avoid re-identification risks. Generally, these legal regulations require the removal of sensitive data but do not consider the possibility of data linkage attacks due to modern image-matching algorithms. In addition, the lack of standardization in DP makes it harder to establish a single solution for all formats of WSIs. These challenges raise problems for bio-informatics researchers in balancing privacy and progress while developing AI algorithms. This paper explores the legal regulations and terminologies for medical data-sharing. We review existing approaches and highlight challenges from the histopathological perspective. We also present a data-sharing guideline for histological data to foster multidisciplinary research and education.Comment: Accepted to FAIEMA 202

    Future of Data Analytics in the Era of the General Data Protection Regulation in Europe

    Get PDF
    The development of evidence to demonstrate ‘value for money’ is regarded as an important step in facilitating the search for the optimal allocation of limited resources and has become an essential component in healthcare decision making. Real-world evidence collected from de-identified individuals throughout the continuum of healthcare represents the most valuable source in technology evaluation. However, in the European Union, the value assessment based on real-world data has become challenging as individuals have recently been given the right to have their personal data erased in the case of consent withdrawal or when the data are regarded as being no longer necessary. This act may limit the usefulness of data in the future as it may introduce information bias. Among healthcare stakeholders, this has become an important topic of discussion because it relates to the importance of data on one side and to the need for personal data protection on the ot

    Using Maximum Entropy to Extend a Consent Privacy Impact Quantification

    Get PDF
    Due to the progress of digitization in the medical sector digital consent becomes more and more common. While digital consent itself has a huge number of benefits for the researcher it can impose a lot of questions for the individual giving it. One of those questions is what impact the consent to sharing data with a research project has on the individual’s privacy. The Consent Privacy Impact Quantification (CPIQ) provides a quantification to help the user making a consent decision based on the potential data sharing risk and his individual acceptance preferences for a research project. While this quantification provides a good first estimation it has some limitations especially in the method the re-identification risk is calculated for a member of a dataset. This paper presents a method using the Maximum Entropy principle. This principle provides a way to measure the maximum unbiased distribution using limited background knowledge, which is provided by epidemiological data. This distribution can then be used to see how much higher the re-identification risk based on a sensitive attribute is compared to the uniform distribution. In addition, the first promising results of the method will be shown based on an experimental setting

    A Pseudonymization Prototype for Hungarian

    Get PDF
    In this paper, we present a pseudonymization prototype for Hungarian, an agglutinating language with complex morphology, implemented as a web service. The service provides the following functions: entity identification and extraction; automatic generation and selection of replacement candidates; automatic and consistent replacement and reinflection of entities in the final pseudonymized document. The named entity recognition model applied handles names of persons well, and it has decent performance on other entity types as well. However ID-like entities need to be handled separately to achieve proper performance (not handled in the current prototype version). For automatic replacement candidate generation, a simple entity embedding model is used. We discuss the performance and limitations of the prototype in detail
    • 

    corecore