142 research outputs found
Impact of translation on biomedical information extraction from real-life clinical notes
The objective of our study is to determine whether using English tools to
extract and normalize French medical concepts on translations provides
comparable performance to French models trained on a set of annotated French
clinical notes. We compare two methods: a method involving French language
models and a method involving English language models. For the native French
method, the Named Entity Recognition (NER) and normalization steps are
performed separately. For the translated English method, after the first
translation step, we compare a two-step method and a terminology-oriented
method that performs extraction and normalization at the same time. We used
French, English and bilingual annotated datasets to evaluate all steps (NER,
normalization and translation) of our algorithms. Concerning the results, the
native French method performs better than the translated English one with a
global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38
[0.36;0.40] for the two English methods tested. In conclusion, despite the
recent improvement of the translation models, there is a significant
performance difference between the two approaches in favor of the native French
method which is more efficient on French medical texts, even with few annotated
documents.Comment: 26 pages, 2 figures, 5 table
A Papillary Thyroid Microcarcinoma Revealed by a Single Bone Lesion with No Poor Prognostic Factors
Objectives. Thyroid carcinomas incidence, in particular papillary variants, is increasing. These cancers are generally considered to have excellent prognosis, and papillary microcarcinomas are usually noninvasive. Many prognostic histopathology factors have been described to guide therapeutic decisions. Most patients are treated with total thyroidectomy without radioiodine treatment or partial surgery. Case Summary. A 65-year-old man with no significant medical history presented with pain in the left chest wall that had been present for several months. A computed tomography (CT) found a large tissue mass of 4âcm responsible for lysis of the middle arch of the 4th rib on the left. It was a single lesion, highly hypermetabolic on the 18-FDG PET/CT. The histology analysis of the biopsy and surgical specimen favored an adenocarcinoma with immunostaining positive for TTF1 and thyroglobulin (Tg). The total thyroidectomy carried out subsequently revealed a 4âmm papillary microcarcinoma with vesicular architecture of the right lobe, well delimited and distant from the capsule without vascular embolisms. After two radioiodine treatments, the patient is in complete clinical, biological, and radiological remission. Conclusion. This extremely rare case of a singular bone metastasis revealing a papillary thyroid microcarcinoma illustrates the necessity of further research to better characterize the forms of papillary thyroid microcarcinomas with potentially poor prognosis
Monitoring the proportion of the population infected by SARS-CoV-2 using age-stratified hospitalisation and serological data: a modelling study.
BACKGROUND: Regional monitoring of the proportion of the population who have been infected by SARS-CoV-2 is important to guide local management of the epidemic, but is difficult in the absence of regular nationwide serosurveys. We aimed to estimate in near real time the proportion of adults who have been infected by SARS-CoV-2. METHODS: In this modelling study, we developed a method to reconstruct the proportion of adults who have been infected by SARS-CoV-2 and the proportion of infections being detected, using the joint analysis of age-stratified seroprevalence, hospitalisation, and case data, with deconvolution methods. We developed our method on a dataset consisting of seroprevalence estimates from 9782 participants (aged â„20 years) in the two worst affected regions of France in May, 2020, and applied our approach to the 13 French metropolitan regions over the period March, 2020, to January, 2021. We validated our method externally using data from a national seroprevalence study done between May and June, 2020. FINDINGS: We estimate that 5·7% (95% CI 5·1-6·4) of adults in metropolitan France had been infected with SARS-CoV-2 by May 11, 2020. This proportion remained stable until August, 2020, and increased to 14·9% (13·2-16·9) by Jan 15, 2021. With 26·5% (23·4-29·8) of adult residents having been infected in Ăle-de-France (Paris region) compared with 5·1% (4·5-5·8) in Brittany by January, 2021, regional variations remained large (coefficient of variation [CV] 0·50) although less so than in May, 2020 (CV 0·74). The proportion infected was twice as high (20·4%, 15·6-26·3) in 20-49-year-olds than in individuals aged 50 years or older (9·7%, 6·9-14·1). 40·2% (34·3-46·3) of infections in adults were detected in June to August, 2020, compared with 49·3% (42·9-55·9) in November, 2020, to January, 2021. Our regional estimates of seroprevalence were strongly correlated with the external validation dataset (coefficient of correlation 0·89). INTERPRETATION: Our simple approach to estimate the proportion of adults that have been infected with SARS-CoV-2 can help to characterise the burden of SARS-CoV-2 infection, epidemic dynamics, and the performance of surveillance in different regions. FUNDING: EU RECOVER, Agence Nationale de la Recherche, Fondation pour la Recherche MĂ©dicale, Institut National de la SantĂ© et de la Recherche MĂ©dicale (Inserm)
Pandemic Influenza Due to pH1N1/2009 Virus: Estimation of Infection Burden in Reunion Island through a Prospective Serosurvey, Austral Winter 2009
International audienceBACKGROUND: To date, there is little information that reflects the true extent of spread of the pH1N1/2009v influenza pandemic at the community level as infection often results in mild or no clinical symptoms. This study aimed at assessing through a prospective study, the attack rate of pH1N1/2009 virus in Reunion Island and risk factors of infection, during the 2009 season.METHODOLOGY/PRINCIPAL FINDINGS: A serosurvey was conducted during the 2009 austral winter, in the frame of a prospective population study. Pairs of sera were collected from 1687 individuals belonging to 772 households, during and after passage of the pandemic wave. Antibodies to pH1N1/2009v were titered using the hemagglutination inhibition assay (HIA) with titers â„ 1/40 being considered positive. Seroprevalence during the first two weeks of detection of pH1N1/2009v in Reunion Island was 29.8% in people under 20 years of age, 35.6% in adults (20-59 years) and 73.3% in the elderly (â„ 60 years) (P<0.0001). Baseline corrected cumulative incidence rates, were 42.9%, 13.9% and 0% in these age groups respectively (P<0.0001). A significant decline in antibody titers occurred soon after the passage of the epidemic wave. Seroconversion rates to pH1N1/2009 correlated negatively with age: 63.2%, 39.4% and 16.7%, in each age group respectively (P<0.0001). Seroconversion occurred in 65.2% of individuals who were seronegative at inclusion compared to 6.8% in those who were initially seropositive.CONCLUSIONS: Seroincidence of pH1N1/2009v infection was three times that estimated from clinical surveillance, indicating that almost two thirds of infections occurring at the community level have escaped medical detection. People under 20 years of age were the most affected group. Pre-epidemic titers â„ 1/40 prevented seroconversion and are likely protective against infection. A concern was raised about the long term stability of the antibody responses
AB1767-HPRâ DOCUMENT SEARCH IN LARGE RHEUMATOLOGY DATABASES: ADVANCED KEYWORD QUERIES TO SELECT HOMOGENEOUS PHENOTYPES
International audienceBackground Natural language processing tools are powerful for mining rheumatology databases, extracting patient information directly from clinical notes. However, these algorithms come with a high computational cost and are often not applicable at the scale of very large databases in the temporality of clinical practice. Objectives The objective of our study is the automatic detection of clinical documents of interest for a specific clinical question, with low computational cost, to be applied on a database of millions of documents. These sets of documents of interest constitute a pre-screening to allow the development of more complex algorithms. Methods The task was considered as an information retrieval task in French clinical texts. Two different methods were compared. For the first method, we used several state-of-the-art document vector representations: TF-IDF, doc2vec, docBERT and tested if the closest documents are relevant. The second method consists in building a powerful query expansion from a key term entered, its French synonyms from the UMLS and the synonyms found by similarity with the embeddings of the CODER algorithm. These methods are developed and evaluated on a set of 8 and on 20 phenotypes respectively (e.g. âpericarditis in lupusâ, etc.). Our database corresponds to 2 million documents from a cohort of patients suffering from four autoimmune diseases: systemic lupus erythematosus, scleroderma, antiphospholipid syndrome, and Takayasuâs disease, coming from the AP-HPâs data warehouse. Results Our experience does not support the vector representation model of clinical notes for searching similar patients. However, searching with an advanced synonym search method can lead to very good results without additional burden for the clinician: we achieved a precision (or positive predictive value) of 0.93 [0.90; 0.96] evaluated manually by a physician and a recall (or sensitivity) of 0.78 [0.71; 0.85] evaluated on the basis of the ICD10 codes of the retrieved patients. Conclusion We propose a new advanced keyword search method with automatic synonym search with very good accuracy and recall performance. References [1]Alison Callahan, Vladimir Polony, JosĂ© D Posada, Juan M Banda, Saurabh Gombar, Nigam H Shah, ACE: the Advanced Cohort Engine for searching longitudinal patient records, Journal of the American Medical Informatics Association , Volume 28, Issue 7, July 2021, Pages 1468â1479, [2]Yuan, Zheng, et al. âCODER: Knowledge-infused cross-lingual medical term embedding for term normalization.â Journal of biomedical informatics 126 (2022): 103983 [3]GĂ©rardin C, Mageau A, MĂ©kinian A, Tannier X, Carrat F, Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study, JMIR Med Inform 2022;10(12):e42379 Table 1. Accuracy and recall results for 13 over 20 queries. Query Accuracy (on 50 manually-annotated document per query) Recall (comparison with respective CIM10) Number of corresponding documents 1 âRheumatoid Arthritisâ 0.98 0.73 15189 2 âTakayasuâ 1 0.94 2459 3 âPericarditis in lupusâ 0.92 0.93 7490 4 âKidney transplantationâ 0.92 0.98 10716 5 âAutoimmune hepatitisâ 0.8 0.85 2797 6 âDermatomyositisâ 1.0 0.77 3510 7 âIdiopathic thrombocytopenic purpuraâ 0.98 0.81 3749 8 âAcute kidney injuryâ 0.86 0.81 15775 9 âRaynaud syndromeâ 0.98 0.98 31900 10 âHIVâ 0.90 0.98 43582 11 âSclerodermaâ 1.0 0.92 24199 12 âDiabetesâ 0.96 0.96 51224 13 ⊠âStrokeâ ⊠0.64 0.63 28162 Overall 0.93 [0.90; 0.96] 0.78 [0.71; 0.85] Figure 1. Overview of the two methods of searching for documents in our data warehouse. Method 1 is document oriented and method 2 is keyword oriented. Acknowledgements The authors would like to thank the AP-HP data warehouse, which provided the data and the computing power to carry out this study under good conditions. We would like to thank all the medical colleges, including internal medicine, rheumatology, dermatology, nephrology, pneumology, hepato-gastroenterology, hematology, endocrinology, gynecology, infectiology, cardiology, oncology, emergency and intensive care units, that gave their agreements for the use of the clinical data. Disclosure of Interests None Declared
Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
Background Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. Objective We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. Methods Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-HĂŽpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. Results For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. Conclusions Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients
Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
International audienceBackground. Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English.Objective.We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases.Methods. Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-HĂŽpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision.Results. For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes.Conclusions. Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients
- âŠ