Search CORE

264 research outputs found

Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models

Author: Melamud Oren
Shivade Chaitanya
Publication venue
Publication date: 01/01/2019
Field of study

Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.Comment: Clinical NLP Workshop 201

arXiv.org e-Print Archive

Crossref

Is artificial data useful for biomedical Natural Language Processing algorithms?

Author: Ive Julia
Specia Lucia
Velupillai Sumithra
Wang Zixu
Publication venue
Publication date: 01/01/2019
Field of study

A major obstacle to the development of Natural Language Processing (NLP) methods in the biomedical domain is data accessibility. This problem can be addressed by generating medical data artificially. Most previous studies have focused on the generation of short clinical text, and evaluation of the data utility has been limited. We propose a generic methodology to guide the generation of clinical text with key phrases. We use the artificial data as additional training data in two key biomedical NLP tasks: text classification and temporal relation extraction. We show that artificially generated training data used in conjunction with real training data can lead to performance boosts for data-greedy neural network algorithms. We also demonstrate the usefulness of the generated data for NLP setups where it fully replaces real training data.Comment: BioNLP 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Generation and evaluation of artificial mental health records for Natural Language Processing

Author: Cardinal Rudolf N.
Ive Julia
Kam Joyce
Puntis Stephen
Roberts Angus
Stewart Robert
Velupillai Sumithra
Verma Somain
Viani Natalia
Yin Lucia
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 17/02/2021
Field of study

Funder: EPSRC Healtex Feasibility Funding (grant EP/N027280/1): "Towards Shareable Data in Clinical Natural Language Processing: Generating Synthetic Electronic Health Records"Funder: National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College LondonFunder: National Institute for Health Research Post Doctoral Fellowship award (grant number PDF-2017-10-029)Funder: Health Data Research UKFunder: Swedish Research Council(2015-00359)/the Marie Sklodowska Curie ActionsAbstract: A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data

Apollo (Cambridge)

Generation and evaluation of artificial mental health records for Natural Language Processing

Author: Cardinal Rudolf N.
Ive Julia
Kam Joyce
Puntis Stephen
Roberts Angus
Stewart Robert
Velupillai Sumithra
Verma Somain
Viani Natalia
Yin Lucia
Publication venue: npj Digital Medicine
Publication date: 01/01/2020
Field of study

Oxford University Research Archive

Apollo (Cambridge)

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

Author: Roman Kern
Samuel Sousa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 16/05/2022
Field of study

Deep learning (DL) models for natural language processing (NLP) tasks often handle private data, demanding protection against breaches and disclosures. Data protection laws, such as the European Union\u27s General Data Protection Regulation (GDPR), thereby enforce the need for privacy. Although many privacy-preserving NLP methods have been proposed in recent years, no categories to organize them have been introduced yet, making it hard to follow the progress of the literature. To close this gap, this article systematically reviews over sixty DL methods for privacy-preserving NLP published between 2016 and 2020, covering theoretical foundations, privacy-enhancing technologies, and analysis of their suitability for real-world scenarios. First, we introduce a novel taxonomy for classifying the existing methods into three categories: data safeguarding methods, trusted methods, and verification methods. Second, we present an extensive summary of privacy threats, datasets for applications, and metrics for privacy evaluation. Third, throughout the review, we describe privacy issues in the NLP pipeline in a holistic view. Further, we discuss open challenges in privacy-preserving NLP regarding data traceability, computation overhead, dataset size, the prevalence of human biases in embeddings, and the privacy-utility tradeoff. Finally, this review presents future research directions to guide successive research and development of privacy-preserving NLP models

Cryptology ePrint Archive

Knowledge Graph and Deep Learning-based Text-to-GQL Model for Intelligent Medical Consultation Chatbot

Author: Chang V
Guan S
Ni P
Okhrati R
Publication venue
Publication date: 06/07/2022
Field of study

Text-to-GQL (Text2GQL) is a task that converts the user's questions into GQL (Graph Query Language) when a graph database is given. That is a task of semantic parsing that transforms natural language problems into logical expressions, which will bring more efficient direct communication between humans and machines. The existing related work mainly focuses on Text-to-SQL tasks, and there is no available semantic parsing method and data set for the graph database. In order to fill the gaps in this field to serve the medical Human–Robot Interactions (HRI) better, we propose this task and a pipeline solution for the Text2GQL task. This solution uses the Adapter pre-trained by “the linking of GQL schemas and the corresponding utterances" as an external knowledge introduction plug-in. By inserting the Adapter into the language model, the mapping between logical language and natural language can be introduced faster and more directly to better realize the end-to-end human–machine language translation task. In the study, the proposed Text2GQL task model is mainly constructed based on an improved pipeline composed of a Language Model, Pre-trained Adapter plug-in, and Pointer Network. This enables the model to copy objects' tokens from utterances, generate corresponding GQL statements for graph database retrieval, and builds an adjustment mechanism to improve the final output. And the experiments have proved that our proposed method has certain competitiveness on the counterpart datasets (Spider, ATIS, GeoQuery, and 39.net) converted from the Text2SQL task, and the proposed method is also practical in medical scenarios

UCL Discovery

Using Clinical Natural Language Processing for Health Outcomes Research: Overview and Actionable Suggestions for Future Advances

Author: Chapman W
Downs J
Dutta R
Hayes J
Liakata M
Morley K
Osborn D
Roberts A
Shah AD
Stewart R
Suominen H
Velupillai S
Publication venue
Publication date: 01/12/2018
Field of study

The importance of incorporating Natural Language Processing(NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances. Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality). From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient- or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches. Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice-versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation

UCL Discovery

A survey on the development status and application prospects of knowledge graph in smart grids

Author: Kou Lei
Ma Chaoqun
Wang Jian
Wang Xi
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/02/2021
Field of study

With the advent of the electric power big data era, semantic interoperability and interconnection of power data have received extensive attention. Knowledge graph technology is a new method describing the complex relationships between concepts and entities in the objective world, which is widely concerned because of its robust knowledge inference ability. Especially with the proliferation of measurement devices and exponential growth of electric power data empowers, electric power knowledge graph provides new opportunities to solve the contradictions between the massive power resources and the continuously increasing demands for intelligent applications. In an attempt to fulfil the potential of knowledge graph and deal with the various challenges faced, as well as to obtain insights to achieve business applications of smart grids, this work first presents a holistic study of knowledge-driven intelligent application integration. Specifically, a detailed overview of electric power knowledge mining is provided. Then, the overview of the knowledge graph in smart grids is introduced. Moreover, the architecture of the big knowledge graph platform for smart grids and critical technologies are described. Furthermore, this paper comprehensively elaborates on the application prospects leveraged by knowledge graph oriented to smart grids, power consumer service, decision-making in dispatching, and operation and maintenance of power equipment. Finally, issues and challenges are summarised.Comment: IET Generation, Transmission & Distributio

arXiv.org e-Print Archive

Directory of Open Access Journals

Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances

Author: Chapman W
Downs J
Dutta R
Hayes J
Liakata M
Morley K
Osborn D
Roberts A
Shah AD
Stewart R
Suominen H
Velupillai S
Publication venue: 'Elsevier BV'
Publication date: 28/10/2022
Field of study

The importance of incorporating Natural Language Processing (NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances.Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality).From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient-or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches.Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation

UTUPub