Search CORE

142 research outputs found

Impact of translation on biomedical information extraction from real-life clinical notes

Author: Carrat Fabrice
Gérardin Christel
Tannier Xavier
Wajsbürt Perceval
Xiong Yuhan
Publication venue
Publication date: 03/06/2023
Field of study

The objective of our study is to determine whether using English tools to extract and normalize French medical concepts on translations provides comparable performance to French models trained on a set of annotated French clinical notes. We compare two methods: a method involving French language models and a method involving English language models. For the native French method, the Named Entity Recognition (NER) and normalization steps are performed separately. For the translated English method, after the first translation step, we compare a two-step method and a terminology-oriented method that performs extraction and normalization at the same time. We used French, English and bilingual annotated datasets to evaluate all steps (NER, normalization and translation) of our algorithms. Concerning the results, the native French method performs better than the translated English one with a global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38 [0.36;0.40] for the two English methods tested. In conclusion, despite the recent improvement of the translation models, there is a significant performance difference between the two approaches in favor of the native French method which is more efficient on French medical texts, even with few annotated documents.Comment: 26 pages, 2 figures, 5 table

arXiv.org e-Print Archive

A Papillary Thyroid Microcarcinoma Revealed by a Single Bone Lesion with No Poor Prognostic Factors

Author: Bonichon Francoise
Carrat Xavier
Cazeau Anne-Laure
Godbert Yann
Henriques-Figueiredo Benedicte
Soubeyran Isabelle
Stegen Marc
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Objectives. Thyroid carcinomas incidence, in particular papillary variants, is increasing. These cancers are generally considered to have excellent prognosis, and papillary microcarcinomas are usually noninvasive. Many prognostic histopathology factors have been described to guide therapeutic decisions. Most patients are treated with total thyroidectomy without radioiodine treatment or partial surgery. Case Summary. A 65-year-old man with no significant medical history presented with pain in the left chest wall that had been present for several months. A computed tomography (CT) found a large tissue mass of 4 cm responsible for lysis of the middle arch of the 4th rib on the left. It was a single lesion, highly hypermetabolic on the 18-FDG PET/CT. The histology analysis of the biopsy and surgical specimen favored an adenocarcinoma with immunostaining positive for TTF1 and thyroglobulin (Tg). The total thyroidectomy carried out subsequently revealed a 4 mm papillary microcarcinoma with vesicular architecture of the right lobe, well delimited and distant from the capsule without vascular embolisms. After two radioiodine treatments, the patient is in complete clinical, biological, and radiological remission. Conclusion. This extremely rare case of a singular bone metastasis revealing a papillary thyroid microcarcinoma illustrates the necessity of further research to better characterize the forms of papillary thyroid microcarcinomas with potentially poor prognosis

Crossref

Directory of Open Access Journals

PubMed Central

Monitoring the proportion of the population infected by SARS-CoV-2 using age-stratified hospitalisation and serological data: a modelling study.

Author: Carrat Fabrice
Cauchemez Simon
de Lamballerie Xavier
Hozé Nathanaël
Lapidus Nathanaël
Lévy-Bruhl Daniel
Paireau Juliette
Salje Henrik
Severi Gianluca
Touvier Mathilde
Tran Kiem Cécile
Zins Marie
Publication venue: Lancet Public Health
Publication date: 01/06/2021
Field of study

BACKGROUND: Regional monitoring of the proportion of the population who have been infected by SARS-CoV-2 is important to guide local management of the epidemic, but is difficult in the absence of regular nationwide serosurveys. We aimed to estimate in near real time the proportion of adults who have been infected by SARS-CoV-2. METHODS: In this modelling study, we developed a method to reconstruct the proportion of adults who have been infected by SARS-CoV-2 and the proportion of infections being detected, using the joint analysis of age-stratified seroprevalence, hospitalisation, and case data, with deconvolution methods. We developed our method on a dataset consisting of seroprevalence estimates from 9782 participants (aged ≥20 years) in the two worst affected regions of France in May, 2020, and applied our approach to the 13 French metropolitan regions over the period March, 2020, to January, 2021. We validated our method externally using data from a national seroprevalence study done between May and June, 2020. FINDINGS: We estimate that 5·7% (95% CI 5·1-6·4) of adults in metropolitan France had been infected with SARS-CoV-2 by May 11, 2020. This proportion remained stable until August, 2020, and increased to 14·9% (13·2-16·9) by Jan 15, 2021. With 26·5% (23·4-29·8) of adult residents having been infected in Île-de-France (Paris region) compared with 5·1% (4·5-5·8) in Brittany by January, 2021, regional variations remained large (coefficient of variation [CV] 0·50) although less so than in May, 2020 (CV 0·74). The proportion infected was twice as high (20·4%, 15·6-26·3) in 20-49-year-olds than in individuals aged 50 years or older (9·7%, 6·9-14·1). 40·2% (34·3-46·3) of infections in adults were detected in June to August, 2020, compared with 49·3% (42·9-55·9) in November, 2020, to January, 2021. Our regional estimates of seroprevalence were strongly correlated with the external validation dataset (coefficient of correlation 0·89). INTERPRETATION: Our simple approach to estimate the proportion of adults that have been infected with SARS-CoV-2 can help to characterise the burden of SARS-CoV-2 infection, epidemic dynamics, and the performance of surveillance in different regions. FUNDING: EU RECOVER, Agence Nationale de la Recherche, Fondation pour la Recherche Médicale, Institut National de la Santé et de la Recherche Médicale (Inserm)

University of Melbourne Institutional Repository

Hal-Diderot

HAL UVSQ

Pandemic Influenza Due to pH1N1/2009 Virus: Estimation of Infection Burden in Reunion Island through a Prospective Serosurvey, Austral Winter 2009

International audienceBACKGROUND: To date, there is little information that reflects the true extent of spread of the pH1N1/2009v influenza pandemic at the community level as infection often results in mild or no clinical symptoms. This study aimed at assessing through a prospective study, the attack rate of pH1N1/2009 virus in Reunion Island and risk factors of infection, during the 2009 season.METHODOLOGY/PRINCIPAL FINDINGS: A serosurvey was conducted during the 2009 austral winter, in the frame of a prospective population study. Pairs of sera were collected from 1687 individuals belonging to 772 households, during and after passage of the pandemic wave. Antibodies to pH1N1/2009v were titered using the hemagglutination inhibition assay (HIA) with titers ≥ 1/40 being considered positive. Seroprevalence during the first two weeks of detection of pH1N1/2009v in Reunion Island was 29.8% in people under 20 years of age, 35.6% in adults (20-59 years) and 73.3% in the elderly (≥ 60 years) (P<0.0001). Baseline corrected cumulative incidence rates, were 42.9%, 13.9% and 0% in these age groups respectively (P<0.0001). A significant decline in antibody titers occurred soon after the passage of the epidemic wave. Seroconversion rates to pH1N1/2009 correlated negatively with age: 63.2%, 39.4% and 16.7%, in each age group respectively (P<0.0001). Seroconversion occurred in 65.2% of individuals who were seronegative at inclusion compared to 6.8% in those who were initially seropositive.CONCLUSIONS: Seroincidence of pH1N1/2009v infection was three times that estimated from clinical surveillance, indicating that almost two thirds of infections occurring at the community level have escaped medical detection. People under 20 years of age were the most affected group. Pre-epidemic titers ≥ 1/40 prevented seroconversion and are likely protective against infection. A concern was raised about the long term stability of the antibody responses

HAL AMU

Directory of Open Access Journals

ResearchOnline at James Cook University

Archive ouverte UNIGE

ResearchOnline@JCU

Public Library of Science (PLOS)

Deakin Research Online

Crossref

PubMed Central

HAL-IRD

Red de Bibliotecas Virtuales de Ciencias Sociales de América Latina y El Caribe

Horizon / Pleins textes

AB1767-HPR DOCUMENT SEARCH IN LARGE RHEUMATOLOGY DATABASES: ADVANCED KEYWORD QUERIES TO SELECT HOMOGENEOUS PHENOTYPES

Author: Carrat F.
Gérardin C.
Mekinian A.
Tannier Xavier
Xong Y.
Publication venue: HAL CCSD
Publication date: 31/05/2023
Field of study

International audienceBackground Natural language processing tools are powerful for mining rheumatology databases, extracting patient information directly from clinical notes. However, these algorithms come with a high computational cost and are often not applicable at the scale of very large databases in the temporality of clinical practice. Objectives The objective of our study is the automatic detection of clinical documents of interest for a specific clinical question, with low computational cost, to be applied on a database of millions of documents. These sets of documents of interest constitute a pre-screening to allow the development of more complex algorithms. Methods The task was considered as an information retrieval task in French clinical texts. Two different methods were compared. For the first method, we used several state-of-the-art document vector representations: TF-IDF, doc2vec, docBERT and tested if the closest documents are relevant. The second method consists in building a powerful query expansion from a key term entered, its French synonyms from the UMLS and the synonyms found by similarity with the embeddings of the CODER algorithm. These methods are developed and evaluated on a set of 8 and on 20 phenotypes respectively (e.g. “pericarditis in lupus”, etc.). Our database corresponds to 2 million documents from a cohort of patients suffering from four autoimmune diseases: systemic lupus erythematosus, scleroderma, antiphospholipid syndrome, and Takayasu’s disease, coming from the AP-HP’s data warehouse. Results Our experience does not support the vector representation model of clinical notes for searching similar patients. However, searching with an advanced synonym search method can lead to very good results without additional burden for the clinician: we achieved a precision (or positive predictive value) of 0.93 [0.90; 0.96] evaluated manually by a physician and a recall (or sensitivity) of 0.78 [0.71; 0.85] evaluated on the basis of the ICD10 codes of the retrieved patients. Conclusion We propose a new advanced keyword search method with automatic synonym search with very good accuracy and recall performance. References [1]Alison Callahan, Vladimir Polony, José D Posada, Juan M Banda, Saurabh Gombar, Nigam H Shah, ACE: the Advanced Cohort Engine for searching longitudinal patient records, Journal of the American Medical Informatics Association , Volume 28, Issue 7, July 2021, Pages 1468–1479, [2]Yuan, Zheng, et al. “CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.” Journal of biomedical informatics 126 (2022): 103983 [3]Gérardin C, Mageau A, Mékinian A, Tannier X, Carrat F, Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study, JMIR Med Inform 2022;10(12):e42379 Table 1. Accuracy and recall results for 13 over 20 queries. Query Accuracy (on 50 manually-annotated document per query) Recall (comparison with respective CIM10) Number of corresponding documents 1 “Rheumatoid Arthritis” 0.98 0.73 15189 2 “Takayasu” 1 0.94 2459 3 “Pericarditis in lupus” 0.92 0.93 7490 4 “Kidney transplantation” 0.92 0.98 10716 5 “Autoimmune hepatitis” 0.8 0.85 2797 6 “Dermatomyositis” 1.0 0.77 3510 7 “Idiopathic thrombocytopenic purpura” 0.98 0.81 3749 8 “Acute kidney injury” 0.86 0.81 15775 9 “Raynaud syndrome” 0.98 0.98 31900 10 “HIV” 0.90 0.98 43582 11 “Scleroderma” 1.0 0.92 24199 12 “Diabetes” 0.96 0.96 51224 13 … “Stroke” … 0.64 0.63 28162 Overall 0.93 [0.90; 0.96] 0.78 [0.71; 0.85] Figure 1. Overview of the two methods of searching for documents in our data warehouse. Method 1 is document oriented and method 2 is keyword oriented. Acknowledgements The authors would like to thank the AP-HP data warehouse, which provided the data and the computing power to carry out this study under good conditions. We would like to thank all the medical colleges, including internal medicine, rheumatology, dermatology, nephrology, pneumology, hepato-gastroenterology, hematology, endocrinology, gynecology, infectiology, cardiology, oncology, emergency and intensive care units, that gave their agreements for the use of the clinical data. Disclosure of Interests None Declared

HAL-Paris 13

The Authors' Reply

Author: Carrat Fabrice
de Lamballerie Xavier
Desenclos Jean-Claude
Zins Marie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/10/2022
Field of study

International audienceNo abstract availabl

HAL UVSQ

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

Author: Carrat Fabrice
Gérardin Christel
Mageau Arthur
Mékinian Arsène
Tannier Xavier
Publication venue: 'JMIR Publications Inc.'
Publication date: 19/12/2022
Field of study

Background Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. Objective We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. Methods Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. Results For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. Conclusions Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients

HAL-Paris 13

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

Author: Carrat Fabrice
Gérardin Christel
Mageau Arthur
Mékinian Arsène
Tannier Xavier
Publication venue: 'JMIR Publications Inc.'
Publication date: 01/12/2022
Field of study

International audienceBackground. Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English.Objective.We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases.Methods. Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision.Results. For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes.Conclusions. Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients

HAL-Inserm

Directory of Open Access Journals

HAL-Paris 13