142 research outputs found

    Impact of translation on biomedical information extraction from real-life clinical notes

    Full text link
    The objective of our study is to determine whether using English tools to extract and normalize French medical concepts on translations provides comparable performance to French models trained on a set of annotated French clinical notes. We compare two methods: a method involving French language models and a method involving English language models. For the native French method, the Named Entity Recognition (NER) and normalization steps are performed separately. For the translated English method, after the first translation step, we compare a two-step method and a terminology-oriented method that performs extraction and normalization at the same time. We used French, English and bilingual annotated datasets to evaluate all steps (NER, normalization and translation) of our algorithms. Concerning the results, the native French method performs better than the translated English one with a global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38 [0.36;0.40] for the two English methods tested. In conclusion, despite the recent improvement of the translation models, there is a significant performance difference between the two approaches in favor of the native French method which is more efficient on French medical texts, even with few annotated documents.Comment: 26 pages, 2 figures, 5 table

    A Papillary Thyroid Microcarcinoma Revealed by a Single Bone Lesion with No Poor Prognostic Factors

    Get PDF
    Objectives. Thyroid carcinomas incidence, in particular papillary variants, is increasing. These cancers are generally considered to have excellent prognosis, and papillary microcarcinomas are usually noninvasive. Many prognostic histopathology factors have been described to guide therapeutic decisions. Most patients are treated with total thyroidectomy without radioiodine treatment or partial surgery. Case Summary. A 65-year-old man with no significant medical history presented with pain in the left chest wall that had been present for several months. A computed tomography (CT) found a large tissue mass of 4 cm responsible for lysis of the middle arch of the 4th rib on the left. It was a single lesion, highly hypermetabolic on the 18-FDG PET/CT. The histology analysis of the biopsy and surgical specimen favored an adenocarcinoma with immunostaining positive for TTF1 and thyroglobulin (Tg). The total thyroidectomy carried out subsequently revealed a 4 mm papillary microcarcinoma with vesicular architecture of the right lobe, well delimited and distant from the capsule without vascular embolisms. After two radioiodine treatments, the patient is in complete clinical, biological, and radiological remission. Conclusion. This extremely rare case of a singular bone metastasis revealing a papillary thyroid microcarcinoma illustrates the necessity of further research to better characterize the forms of papillary thyroid microcarcinomas with potentially poor prognosis

    Monitoring the proportion of the population infected by SARS-CoV-2 using age-stratified hospitalisation and serological data: a modelling study.

    Get PDF
    BACKGROUND: Regional monitoring of the proportion of the population who have been infected by SARS-CoV-2 is important to guide local management of the epidemic, but is difficult in the absence of regular nationwide serosurveys. We aimed to estimate in near real time the proportion of adults who have been infected by SARS-CoV-2. METHODS: In this modelling study, we developed a method to reconstruct the proportion of adults who have been infected by SARS-CoV-2 and the proportion of infections being detected, using the joint analysis of age-stratified seroprevalence, hospitalisation, and case data, with deconvolution methods. We developed our method on a dataset consisting of seroprevalence estimates from 9782 participants (aged ≄20 years) in the two worst affected regions of France in May, 2020, and applied our approach to the 13 French metropolitan regions over the period March, 2020, to January, 2021. We validated our method externally using data from a national seroprevalence study done between May and June, 2020. FINDINGS: We estimate that 5·7% (95% CI 5·1-6·4) of adults in metropolitan France had been infected with SARS-CoV-2 by May 11, 2020. This proportion remained stable until August, 2020, and increased to 14·9% (13·2-16·9) by Jan 15, 2021. With 26·5% (23·4-29·8) of adult residents having been infected in Île-de-France (Paris region) compared with 5·1% (4·5-5·8) in Brittany by January, 2021, regional variations remained large (coefficient of variation [CV] 0·50) although less so than in May, 2020 (CV 0·74). The proportion infected was twice as high (20·4%, 15·6-26·3) in 20-49-year-olds than in individuals aged 50 years or older (9·7%, 6·9-14·1). 40·2% (34·3-46·3) of infections in adults were detected in June to August, 2020, compared with 49·3% (42·9-55·9) in November, 2020, to January, 2021. Our regional estimates of seroprevalence were strongly correlated with the external validation dataset (coefficient of correlation 0·89). INTERPRETATION: Our simple approach to estimate the proportion of adults that have been infected with SARS-CoV-2 can help to characterise the burden of SARS-CoV-2 infection, epidemic dynamics, and the performance of surveillance in different regions. FUNDING: EU RECOVER, Agence Nationale de la Recherche, Fondation pour la Recherche MĂ©dicale, Institut National de la SantĂ© et de la Recherche MĂ©dicale (Inserm)

    Pandemic Influenza Due to pH1N1/2009 Virus: Estimation of Infection Burden in Reunion Island through a Prospective Serosurvey, Austral Winter 2009

    Get PDF
    International audienceBACKGROUND: To date, there is little information that reflects the true extent of spread of the pH1N1/2009v influenza pandemic at the community level as infection often results in mild or no clinical symptoms. This study aimed at assessing through a prospective study, the attack rate of pH1N1/2009 virus in Reunion Island and risk factors of infection, during the 2009 season.METHODOLOGY/PRINCIPAL FINDINGS: A serosurvey was conducted during the 2009 austral winter, in the frame of a prospective population study. Pairs of sera were collected from 1687 individuals belonging to 772 households, during and after passage of the pandemic wave. Antibodies to pH1N1/2009v were titered using the hemagglutination inhibition assay (HIA) with titers ≄ 1/40 being considered positive. Seroprevalence during the first two weeks of detection of pH1N1/2009v in Reunion Island was 29.8% in people under 20 years of age, 35.6% in adults (20-59 years) and 73.3% in the elderly (≄ 60 years) (P<0.0001). Baseline corrected cumulative incidence rates, were 42.9%, 13.9% and 0% in these age groups respectively (P<0.0001). A significant decline in antibody titers occurred soon after the passage of the epidemic wave. Seroconversion rates to pH1N1/2009 correlated negatively with age: 63.2%, 39.4% and 16.7%, in each age group respectively (P<0.0001). Seroconversion occurred in 65.2% of individuals who were seronegative at inclusion compared to 6.8% in those who were initially seropositive.CONCLUSIONS: Seroincidence of pH1N1/2009v infection was three times that estimated from clinical surveillance, indicating that almost two thirds of infections occurring at the community level have escaped medical detection. People under 20 years of age were the most affected group. Pre-epidemic titers ≄ 1/40 prevented seroconversion and are likely protective against infection. A concern was raised about the long term stability of the antibody responses

    AB1767-HPR DOCUMENT SEARCH IN LARGE RHEUMATOLOGY DATABASES: ADVANCED KEYWORD QUERIES TO SELECT HOMOGENEOUS PHENOTYPES

    No full text
    International audienceBackground Natural language processing tools are powerful for mining rheumatology databases, extracting patient information directly from clinical notes. However, these algorithms come with a high computational cost and are often not applicable at the scale of very large databases in the temporality of clinical practice. Objectives The objective of our study is the automatic detection of clinical documents of interest for a specific clinical question, with low computational cost, to be applied on a database of millions of documents. These sets of documents of interest constitute a pre-screening to allow the development of more complex algorithms. Methods The task was considered as an information retrieval task in French clinical texts. Two different methods were compared. For the first method, we used several state-of-the-art document vector representations: TF-IDF, doc2vec, docBERT and tested if the closest documents are relevant. The second method consists in building a powerful query expansion from a key term entered, its French synonyms from the UMLS and the synonyms found by similarity with the embeddings of the CODER algorithm. These methods are developed and evaluated on a set of 8 and on 20 phenotypes respectively (e.g. “pericarditis in lupus”, etc.). Our database corresponds to 2 million documents from a cohort of patients suffering from four autoimmune diseases: systemic lupus erythematosus, scleroderma, antiphospholipid syndrome, and Takayasu’s disease, coming from the AP-HP’s data warehouse. Results Our experience does not support the vector representation model of clinical notes for searching similar patients. However, searching with an advanced synonym search method can lead to very good results without additional burden for the clinician: we achieved a precision (or positive predictive value) of 0.93 [0.90; 0.96] evaluated manually by a physician and a recall (or sensitivity) of 0.78 [0.71; 0.85] evaluated on the basis of the ICD10 codes of the retrieved patients. Conclusion We propose a new advanced keyword search method with automatic synonym search with very good accuracy and recall performance. References [1]Alison Callahan, Vladimir Polony, JosĂ© D Posada, Juan M Banda, Saurabh Gombar, Nigam H Shah, ACE: the Advanced Cohort Engine for searching longitudinal patient records, Journal of the American Medical Informatics Association , Volume 28, Issue 7, July 2021, Pages 1468–1479, [2]Yuan, Zheng, et al. “CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.” Journal of biomedical informatics 126 (2022): 103983 [3]GĂ©rardin C, Mageau A, MĂ©kinian A, Tannier X, Carrat F, Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study, JMIR Med Inform 2022;10(12):e42379 Table 1. Accuracy and recall results for 13 over 20 queries. Query Accuracy (on 50 manually-annotated document per query) Recall (comparison with respective CIM10) Number of corresponding documents 1 “Rheumatoid Arthritis” 0.98 0.73 15189 2 “Takayasu” 1 0.94 2459 3 “Pericarditis in lupus” 0.92 0.93 7490 4 “Kidney transplantation” 0.92 0.98 10716 5 “Autoimmune hepatitis” 0.8 0.85 2797 6 “Dermatomyositis” 1.0 0.77 3510 7 “Idiopathic thrombocytopenic purpura” 0.98 0.81 3749 8 “Acute kidney injury” 0.86 0.81 15775 9 “Raynaud syndrome” 0.98 0.98 31900 10 “HIV” 0.90 0.98 43582 11 “Scleroderma” 1.0 0.92 24199 12 “Diabetes” 0.96 0.96 51224 13 
 “Stroke” 
 0.64 0.63 28162 Overall 0.93 [0.90; 0.96] 0.78 [0.71; 0.85] Figure 1. Overview of the two methods of searching for documents in our data warehouse. Method 1 is document oriented and method 2 is keyword oriented. Acknowledgements The authors would like to thank the AP-HP data warehouse, which provided the data and the computing power to carry out this study under good conditions. We would like to thank all the medical colleges, including internal medicine, rheumatology, dermatology, nephrology, pneumology, hepato-gastroenterology, hematology, endocrinology, gynecology, infectiology, cardiology, oncology, emergency and intensive care units, that gave their agreements for the use of the clinical data. Disclosure of Interests None Declared

    The Authors' Reply

    No full text
    International audienceNo abstract availabl

    Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

    No full text
    Background Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. Objective We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. Methods Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-HĂŽpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. Results For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. Conclusions Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients

    Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

    No full text
    International audienceBackground. Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English.Objective.We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases.Methods. Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-HĂŽpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision.Results. For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes.Conclusions. Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients
    • 

    corecore