Search CORE

84 research outputs found

The Smart Data Extractor, a Clinician Friendly Solution to Accelerate and Improve the Data Collection During Clinical Trials

Author: Boyer Olivia
Burgun Anita
Douillet Maxime
Friedlander Lisa
Garcelon Nicolas
Neuraz Antoine
Quennelle Sophie
Publication venue
Publication date: 18/05/2023
Field of study

In medical research, the traditional way to collect data, i.e. browsing patient files, has been proven to induce bias, errors, human labor and costs. We propose a semi-automated system able to extract every type of data, including notes. The Smart Data Extractor pre-populates clinic research forms by following rules. We performed a cross-testing experiment to compare semi-automated to manual data collection. 20 target items had to be collected for 79 patients. The average time to complete one form was 6'81'' for manual data collection and 3'22'' with the Smart Data Extractor. There were also more mistakes during manual data collection (163 for the whole cohort) than with the Smart Data Extractor (46 for the whole cohort). We present an easy to use, understandable and agile solution to fill out clinical research forms. It reduces human effort and provides higher quality data, avoiding data re-entry and fatigue induced errors.Comment: IOS Press, 2023, Studies in Health Technology and Informatic

arXiv.org e-Print Archive

HAL-Inserm

INRIA a CCSD electronic archive server

L'équipe-projet HeKA

Author: Allassonniere Stéphanie
Angoulvant François
Burgun Anita
Chen Xiaoyi
Coulet Adrien
Drummond David
Garcelon Nicolas
Jannot Anne-Sophie
Katsahian Sandrine
Neuraz Antoine
Rance Bastien
Sabatier Brigitte
Tsopra Rosy
Ursino Moreno
Zohar Sarah
Publication venue: 'Departmento Expresion Grafica y Cartografia'
Publication date: 21/04/2021
Field of study

This article describe the Inria, Inserm, Univ. de Paris project team HeKA.International audienceHeKA est une équipe-projet de recherche commune à Inria, l’Inserm et l’Université de Paris. Plus précisément, HeKA, dépend du Centre de Recherche des Cordeliers et du Centre Inria de Paris. En plus de deux chercheurs Inria et Inserm, HeKA est composé de chercheurs hospitalo-universitaires de l’AP-HP associés à des services de l’Hôpital Européen Georges Pompidou, l’Hôpital Necker et de l’Institut Imagine. Les thèmes de recherche de l’équipe sont l’informatique médicale, les biostatistiques et les mathématiques appliquées pour l’aide à la décision clinique. Le terme HeKA est à la fois une référence à la divité égyptienne de la médecine et un acronyme pour Health data- and model- driven Knowledge Acquisition.L’équipe HeKA fait suite à l’équipe 22 (Information Sciences to support Personalized Medicine) dirigée par Anita Burgun au Centre de Recherche des Corderliers (Inserm, Université de Paris). La responsable de HeKA est Sarah Zohar, elle est secondée par Adrien Coulet

INRIA a CCSD electronic archive server

Natural language understanding for the electronic health records : access to information and information extraction

Author: Neuraz Antoine
Publication venue
Publication date: 15/12/2020
Field of study

Dans le domaine médical, la langue naturelle tient une place particulièrement importante pour la communication et le stockage d'informations. En effet, outre les données dites "structurées" (*e.g.*, les résultats d'examens biologiques), la langue naturelle est omniprésente : formulaires de demande d'examens, notes de suivi clinique, comptes-rendus d'hospitalisation, comptes-rendus d'examens d'imagerie, en sont des exemples. Ce langage naturel médical est complexe et difficile à maîtriser : il faut plusieurs années aux futurs médecins pour apprendre à le déchiffrer correctement. En effet, le jargon y est omniprésent, ainsi que des références à des connaissances implicites, des abréviations inconstantes ou encore des fautes d'orthographe ou de frappe. Malgré la difficulté, entraîner des machines à comprendre le texte médical, soit pour faciliter l'accès à l'information, soit pour extraire de l'information, est une tâche essentielle pour améliorer à la fois l'accès à l'information et les connaissances médicales. La première partie de cette thèse concerne l'accès aux informations et s'intéresse à la compréhension du langage naturel dans le cadre d'un agent conversationnel permettant d'interroger le dossier patient informatisé. Nous nous sommes intéressés à des techniques de supervision distante (*i.e.*, génération, paraphrase) pour entraîner un modèle de compréhension de la langue en l'absence de données d'entraînement basé sur des réseaux de neurones récurrents. Nous avons également étudié l'apport de plongements lexicaux contextualisés (word embeddings) spécialisés sur des tâches de compréhension du langage médical. Dans la deuxième partie, nous nous sommes intéressés à l'extraction d'informations sur les médicaments dans les textes clinique. Nous avons en premier lieu développé un corpus de textes cliniques annotés, et un modèle d'extraction hybride combinant règles expertes et apprentissage par réseaux de neurones récurrents. Par la suite, nous avons montré l'intérêt de déployer de tels systèmes à grande échelle pour assurer une réponse rapide dans le cadre de maladies émergentes telles que la COVID-19.In the medical field, natural language plays an important role in communication and information storage. Indeed, in addition to structured data (*e.g.*, results of biological tests), natural language is omnipresent: discharge summaries, clinical follow-up notes, hospitalization reports, radiologic tests results are examples of this. This natural medical language is complex and difficult to master: it takes several years for future doctors to learn how to decipher it correctly. Indeed, jargon is omnipresent, as well as references to implicit knowledge, inconsistent abbreviations, spelling and typing errors. Despite the difficulty, training machines to understand medical text, either to facilitate access to information or to extract information, is an essential task to improve both access to information and medical knowledge. A first part of this thesis deals with access to information and focuses on the understanding of natural language in the context of a conversational agent allowing to query the computerized patient record. We leveraged in distant supervision techniques (*i.e.*, generation, paraphrase) to train a model of language comprehension in the absence of training data, based on recurrent neural networks. We have also studied the contribution of specialized contextualized word embeddings on medical language comprehension tasks. In the second part, we focused on the extraction of drug information from clinical texts. We first developed a corpus of annotated clinical texts, and a hybrid extraction model combining expert rules and recurrent neural networks. Subsequently, we showed the interest of deploying such systems at a large scale to provide a rapid response in the context of emerging diseases such as COVID-19

Theses.fr

Compréhension du langage naturel pour le dossier patient informatisé : accès à l’information et extraction d’information

Author: Neuraz Antoine
Publication venue: HAL CCSD
Publication date: 15/12/2020
Field of study

In the medical field, natural language plays an important role in communication and information storage. Indeed, in addition to structured data (*e.g.*, results of biological tests), natural language is omnipresent: discharge summaries, clinical follow-up notes, hospitalization reports, radiologic tests results are examples of this. This natural medical language is complex and difficult to master: it takes several years for future doctors to learn how to decipher it correctly. Indeed, jargon is omnipresent, as well as references to implicit knowledge, inconsistent abbreviations, spelling and typing errors. Despite the difficulty, training machines to understand medical text, either to facilitate access to information or to extract information, is an essential task to improve both access to information and medical knowledge. A first part of this thesis deals with access to information and focuses on the understanding of natural language in the context of a conversational agent allowing to query the computerized patient record. We leveraged in distant supervision techniques (*i.e.*, generation, paraphrase) to train a model of language comprehension in the absence of training data, based on recurrent neural networks. We have also studied the contribution of specialized contextualized word embeddings on medical language comprehension tasks. In the second part, we focused on the extraction of drug information from clinical texts. We first developed a corpus of annotated clinical texts, and a hybrid extraction model combining expert rules and recurrent neural networks. Subsequently, we showed the interest of deploying such systems at a large scale to provide a rapid response in the context of emerging diseases such as COVID-19.Dans le domaine médical, la langue naturelle tient une place particulièrement importante pour la communication et le stockage d'informations. En effet, outre les données dites "structurées" (*e.g.*, les résultats d'examens biologiques), la langue naturelle est omniprésente : formulaires de demande d'examens, notes de suivi clinique, comptes-rendus d'hospitalisation, comptes-rendus d'examens d'imagerie, en sont des exemples. Ce langage naturel médical est complexe et difficile à maîtriser : il faut plusieurs années aux futurs médecins pour apprendre à le déchiffrer correctement. En effet, le jargon y est omniprésent, ainsi que des références à des connaissances implicites, des abréviations inconstantes ou encore des fautes d'orthographe ou de frappe. Malgré la difficulté, entraîner des machines à comprendre le texte médical, soit pour faciliter l'accès à l'information, soit pour extraire de l'information, est une tâche essentielle pour améliorer à la fois l'accès à l'information et les connaissances médicales. La première partie de cette thèse concerne l'accès aux informations et s'intéresse à la compréhension du langage naturel dans le cadre d'un agent conversationnel permettant d'interroger le dossier patient informatisé. Nous nous sommes intéressés à des techniques de supervision distante (*i.e.*, génération, paraphrase) pour entraîner un modèle de compréhension de la langue en l'absence de données d'entraînement basé sur des réseaux de neurones récurrents. Nous avons également étudié l'apport de plongements lexicaux contextualisés (word embeddings) spécialisés sur des tâches de compréhension du langage médical. Dans la deuxième partie, nous nous sommes intéressés à l'extraction d'informations sur les médicaments dans les textes clinique. Nous avons en premier lieu développé un corpus de textes cliniques annotés, et un modèle d'extraction hybride combinant règles expertes et apprentissage par réseaux de neurones récurrents. Par la suite, nous avons montré l'intérêt de déployer de tels systèmes à grande échelle pour assurer une réponse rapide dans le cadre de maladies émergentes telles que la COVID-19

Thèses en Ligne

Extracting Diagnosis Pathways from Electronic Health Records Using Deep Reinforcement Learning

Author: Coulet Adrien
Muyama Lillian
Neuraz Antoine
Publication venue
Publication date: 10/05/2023
Field of study

Clinical diagnosis guidelines aim at specifying the steps that may lead to a diagnosis. Guidelines enable rationalizing and normalizing clinical decisions but suffer drawbacks as they are built to cover the majority of the population and may fail in guiding to the right diagnosis for patients with uncommon conditions or multiple pathologies. Moreover, their updates are long and expensive, making them unsuitable to emerging practices. Inspired by guidelines, we formulate the task of diagnosis as a sequential decision-making problem and study the use of Deep Reinforcement Learning (DRL) algorithms trained on Electronic Health Records (EHRs) to learn the optimal sequence of observations to perform in order to obtain a correct diagnosis. Because of the variety of DRL algorithms and of their sensitivity to the context, we considered several approaches and settings that we compared to each other, and to classical classifiers. We experimented on a synthetic but realistic dataset to differentially diagnose anemia and its subtypes and particularly evaluated the robustness of various approaches to noise and missing data as those are frequent in EHRs. Within the DRL algorithms, Dueling DQN with Prioritized Experience Replay, and Dueling Double DQN with Prioritized Experience Replay show the best and most stable performances. In the presence of imperfect data, the DRL algorithms show competitive, but less stable performances when compared to the classifiers (Random Forest and XGBoost); although they enable the progressive generation of a pathway to the suggested diagnosis, which can both guide or explain the decision process

arXiv.org e-Print Archive

Electronic health records for the diagnosis of rare diseases

Author: Burgun Anita
Garcelon Nicolas
Neuraz Antoine
Salomon Rémi
Publication venue: 'Elsevier BV'
Publication date: 01/04/2020
Field of study

International audienceWith the emergence of electronic health records, the reuse of clinical data offers new perspectives for the diagnosis and management of patients with rare diseases. However, there are many obstacles to the repurposing of clinical data. The development of decision support systems depends on the ability to recruit patients, extract and integrate the patients' data, mine and stratify these data, and integrate the decision support algorithm into patient care. This last step requires an adaptability of the electronic health records to integrate learning health system tools. In this literature review, we examine the research that provides solutions to unlock these barriers and accelerate translational research: structured electronic health records and free-text search engines to find patients, data warehouses and natural language processing to extract phenotypes, machine learning algorithms to classify patients, and similarity metrics to diagnose patients. Medical informatics is experiencing an impellent request to develop decision support systems, and this requires ethical considerations for clinicians and patients to ensure appropriate use of health data

HAL-Inserm

Natural language understanding for task oriented dialog in the biomedical domain in a low ressources context, NIPS Workshop

Author: Burgun Anita
Campillos Llanos Leonardo
Neuraz Antoine
Rosset Sophie
Publication venue: HAL CCSD
Publication date
Field of study

In the biomedical domain, the lack of sharable datasets often limit the possibilityof developing natural language processing systems, especially dialogue applica-tions and natural language understanding models. To overcome this issue, weexplore data generation using templates and terminologies and data augmentationapproaches. Namely, we report our experiments using paraphrasing and wordrepresentations learned on a large EHR corpus with Fasttext and ELMo, to learn aNLU model without any available dataset. We evaluate on a NLU task of naturallanguage queries in EHRs divided in slot-filling and intent classification sub-tasks.On the slot-filling task, we obtain a F-score of 0.76 with the ELMo representation;and on the classification task, a mean F-score of 0.71. Our results show that thismethod could be used to develop a baseline system