5 research outputs found

    Deidentification of free-text medical records using pre-trained bidirectional transformers

    No full text

    Automatic privacy and utility evaluation of anonymized documents via deep learning

    Get PDF
    Text anonymization methods are evaluated by comparing their outputs with human-based anonymizations through standard information retrieval (IR) metrics. On the one hand, the residual disclosure risk is quantified with the recall metric, which gives the proportion of re-identifying terms successfully detected by the anonymization algorithm. On the other hand, the preserved utility is measured with the precision metric, which accounts the proportion of masked terms that were also annotated by the human experts. Nevertheless, because these evaluation metrics were meant for information retrieval rather than privacy-oriented tasks, they suffer from several drawbacks. First, they assume a unique ground truth, and this does not hold for text anonymization, where several masking choices could be equally valid to prevent re-identification. Second, annotation-based evaluation relies on human judgements, which are inherently subjective and may be prone to errors. Finally, both metrics weight terms uniformly, thereby ignoring the fact that the influence on the disclosure risk or on utility preservation of some terms may be much larger than of others. To overcome these drawbacks, in this thesis we propose two novel methods to evaluate both the disclosure risk and the utility preserved in anonymized texts. Our approach leverages deep learning methods to perform this evaluation automatically, thereby not requiring human annotations. For assessing disclosure risks, we propose using a re-identification attack, which we define as a multi-class classification task built on top of state-of-the art language models. To make it feasible, the attack has been designed to capture the means and computational resources expected to be available at the attacker's end. For utility assessment, we propose a method that measures the information loss incurred during the anonymization process, which relies on a neural masked language modeling. We illustrate the effectiveness of our methods by evaluating the disclosure risk and retained utility of several well-known techniques and tools for text anonymization on a common dataset. Empirical results show significant privacy risks for all of them (including manual anonymization) and consistently proportional utility preservation

    Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems

    Get PDF
    This paper provides an overview of current linguistic and ontological challenges which have to be met in order to provide full support to the transformation of health ecosystems in order to meet precision medicine (5 PM) standards. It highlights both standardization and interoperability aspects regarding formal, controlled representations of clinical and research data, requirements for smart support to produce and encode content in a way that humans and machines can understand and process it. Starting from the current text-centered communication practices in healthcare and biomedical research, it addresses the state of the art in information extraction using natural language processing (NLP). An important aspect of the language-centered perspective of managing health data is the integration of heterogeneous data sources, employing different natural languages and different terminologies. This is where biomedical ontologies, in the sense of formal, interchangeable representations of types of domain entities come into play. The paper discusses the state of the art of biomedical ontologies, addresses their importance for standardization and interoperability and sheds light to current misconceptions and shortcomings. Finally, the paper points out next steps and possible synergies of both the field of NLP and the area of Applied Ontology and Semantic Web to foster data interoperability for 5 P

    Using machine learning for automated de-identification and clinical coding of free text data in electronic medical records

    Full text link
    The widespread adoption of Electronic Medical Records (EMRs) in hospitals continues to increase the amount of patient data that are digitally stored. Although the primary use of the EMR is to support patient care by making all relevant information accessible, governments and health organisations are looking for ways to unleash the potential of these data for secondary purposes, including clinical research, disease surveillance and automation of healthcare processes and workflows. EMRs include large quantities of free text documents that contain valuable information. The greatest challenges in using the free text data in EMRs include the removal of personally identifiable information and the extraction of relevant information for specific tasks such as clinical coding. Machine learning-based automated approaches can potentially address these challenges. This thesis aims to explore and improve the performance of machine learning models for automated de-identification and clinical coding of free text data in EMRs, as captured in hospital discharge summaries, and facilitate the applications of these approaches in real-world use cases. It does so by 1) implementing an end-to-end de-identification framework using an ensemble of deep learning models; 2) developing a web-based system for de-identification of free text (DEFT) with an interactive learning loop; 3) proposing and implementing a hierarchical label-wise attention transformer model (HiLAT) for explainable International Classification of Diseases (ICD) coding; and 4) investigating the use of extreme multi-label long text transformer-based models for automated ICD coding. The key findings include: 1) An end-to-end framework using an ensemble of deep learning base-models achieved excellent performance on the de-identification task. 2) A new web-based de-identification software system (DEFT) can be readily and easily adopted by data custodians and researchers to perform de-identification of free text in EMRs. 3) A novel domain-specific transformer-based model (HiLAT) achieved state-of-the-art (SOTA) results for predicting ICD codes on a Medical Information Mart for Intensive Care (MIMIC-III) dataset comprising the discharge summaries (n=12,808) that are coded with at least one of the most 50 frequent diagnosis and procedure codes. In addition, the label-wise attention scores for the tokens in the discharge summary presented a potential explainability tool for checking the face validity of ICD code predictions. 4) An optimised transformer-based model, PLM-ICD, achieved the latest SOTA results for ICD coding on all the discharge summaries of the MIMIC-III dataset (n=59,652). The segmentation method, which split the long text consecutively into multiple small chunks, addressed the problem of applying transformer-based models to long text datasets. However, using transformer-based models on extremely large label sets needs further research. These findings demonstrate that the de-identification and clinical coding tasks can benefit from the application of machine learning approaches, present practical tools for implementing these approaches, and highlight priorities for further research

    Managing healthcare transformation towards P5 medicine (Published in Frontiers in Medicine)

    Get PDF
    Health and social care systems around the world are facing radical organizational, methodological and technological paradigm changes to meet the requirements for improving quality and safety of care as well as efficiency and efficacy of care processes. In this they’re trying to manage the challenges of ongoing demographic changes towards aging, multi-diseased societies, development of human resources, a health and social services consumerism, medical and biomedical progress, and exploding costs for health-related R&D as well as health services delivery. Furthermore, they intend to achieve sustainability of global health systems by transforming them towards intelligent, adaptive and proactive systems focusing on health and wellness with optimized quality and safety outcomes. The outcome is a transformed health and wellness ecosystem combining the approaches of translational medicine, 5P medicine (personalized, preventive, predictive, participative precision medicine) and digital health towards ubiquitous personalized health services realized independent of time and location. It considers individual health status, conditions, genetic and genomic dispositions in personal social, occupational, environmental and behavioural context, thus turning health and social care from reactive to proactive. This requires the advancement communication and cooperation among the business actors from different domains (disciplines) with different methodologies, terminologies/ontologies, education, skills and experiences from data level (data sharing) to concept/knowledge level (knowledge sharing). The challenge here is the understanding and the formal as well as consistent representation of the world of sciences and practices, i.e. of multidisciplinary and dynamic systems in variable context, for enabling mapping between the different disciplines, methodologies, perspectives, intentions, languages, etc. Based on a framework for dynamically, use-case-specifically and context aware representing multi-domain ecosystems including their development process, systems, models and artefacts can be consistently represented, harmonized and integrated. The response to that problem is the formal representation of health and social care ecosystems through an system-oriented, architecture-centric, ontology-based and policy-driven model and framework, addressing all domains and development process views contributing to the system and context in question. Accordingly, this Research Topic would like to address this change towards 5P medicine. Specifically, areas of interest include, but are not limited: • A multidisciplinary approach to the transformation of health and social systems • Success factors for sustainable P5 ecosystems • AI and robotics in transformed health ecosystems • Transformed health ecosystems challenges for security, privacy and trust • Modelling digital health systems • Ethical challenges of personalized digital health • Knowledge representation and management of transformed health ecosystems Table of Contents: 04 Editorial: Managing healthcare transformation towards P5 medicine Bernd Blobel and Dipak Kalra 06 Transformation of Health and Social Care Systems—An Interdisciplinary Approach Toward a Foundational Architecture Bernd Blobel, Frank Oemig, Pekka Ruotsalainen and Diego M. Lopez 26 Transformed Health Ecosystems—Challenges for Security, Privacy, and Trust Pekka Ruotsalainen and Bernd Blobel 36 Success Factors for Scaling Up the Adoption of Digital Therapeutics Towards the Realization of P5 Medicine Alexandra Prodan, Lucas Deimel, Johannes Ahlqvist, Strahil Birov, Rainer Thiel, Meeri Toivanen, Zoi Kolitsi and Dipak Kalra 49 EU-Funded Telemedicine Projects – Assessment of, and Lessons Learned From, in the Light of the SARS-CoV-2 Pandemic Laura Paleari, Virginia Malini, Gabriella Paoli, Stefano Scillieri, Claudia Bighin, Bernd Blobel and Mauro Giacomini 60 A Review of Artificial Intelligence and Robotics in Transformed Health Ecosystems Kerstin Denecke and Claude R. Baudoin 73 Modeling digital health systems to foster interoperability Frank Oemig and Bernd Blobel 89 Challenges and solutions for transforming health ecosystems in low- and middle-income countries through artificial intelligence Diego M. López, Carolina Rico-Olarte, Bernd Blobel and Carol Hullin 111 Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems Markus Kreuzthaler, Mathias Brochhausen, Cilia Zayas, Bernd Blobel and Stefan Schulz 126 The ethical challenges of personalized digital health Els Maeckelberghe, Kinga Zdunek, Sara Marceglia, Bobbie Farsides and Michael Rigb
    corecore