267 research outputs found

    PIN36 Six Years Observational Study of the Cost of Highly Active Antiretroviral Therapy and HIV/AIDS Control

    Get PDF

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy

    Full text link
    Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet remain under-studied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. We fine-tuned statistical and pre-trained BERT-based models for three esophagitis classification tasks: Task 1) presence of esophagitis, Task 2) severe esophagitis or not, and Task 3) no esophagitis vs. grade 1 vs. grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. Fine-tuning PubmedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for Task 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by over 2% for all tasks. Silver-labeled data improved the macro-F1 by over 3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for Task 1, 2, and 3, respectively, without additional fine-tuning. To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinic notes. The promising performance provides proof-of-concept for NLP-based automated detailed toxicity monitoring in expanded domains.Comment: 17 pages, 6 tables, 1figure, submiting to JCO-CCI for revie

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

    Association of Diabetic Ketoacidosis and HbA1c at Onset with Year-Three HbA1c in Children and Adolescents with Type 1 Diabetes: Data from the International SWEET Registry

    Get PDF
    Objective: To establish whether diabetic ketoacidosis (DKA) or HbA1c at onset is associated with year-three HbA1c in children with type 1 diabetes (T1D). Methods: Children with T1D from the SWEET registry, diagnosed <18 years, with documented clinical presentation, HbA1c at onset and follow-up were included. Participants were categorized according to T1D onset: (a) DKA (DKA with coma, DKA without coma, no DKA); (b) HbA1c at onset (low [<10%], medium [10 to <12%], high [≥12%]). To adjust for demographics, linear regression was applied with interaction terms for DKA and HbA1c at onset groups (adjusted means with 95% CI). Association between year-three HbA1c and both HbA1c and presentation at onset was analyzed (Vuong test). Results: Among 1420 children (54% males; median age at onset 9.1 years [Q1;Q3: 5.8;12.2]), 6% of children experienced DKA with coma, 37% DKA without coma, and 57% no DKA. Year-three HbA1c was lower in the low compared to high HbA1c at onset group, both in the DKA without coma (7.1% [6.8;7.4] vs 7.6% [7.5;7.8], P = .03) and in the no DKA group (7.4% [7.2;7.5] vs 7.8% [7.6;7.9], P = .01), without differences between low and medium HbA1c at onset groups. Year-three HbA1c did not differ among HbA1c at onset groups in the DKA with coma group. HbA1c at onset as an explanatory variable was more closely associated with year-three HbA1c compared to presentation at onset groups (P = .02). Conclusions: Year-three HbA1c is more closely related to HbA1c than to DKA at onset; earlier hyperglycemia detection might be crucial to improving year-three HbA1c.info:eu-repo/semantics/publishedVersio

    Clinical narrative analytics challenges

    Get PDF
    Precision medicine or evidence based medicine is based on the extraction of knowledge from medical records to provide individuals with the appropriate treatment in the appropriate moment according to the patient features. Despite the efforts of using clinical narratives for clinical decision support, many challenges have to be faced still today such as multilinguarity, diversity of terms and formats in different services, acronyms, negation, to name but a few. The same problems exist when one wants to analyze narratives in literature whose analysis would provide physicians and researchers with highlights. In this talk we will analyze challenges, solutions and open problems and will analyze several frameworks and tools that are able to perform NLP over free text to extract medical entities by means of Named Entity Recognition process. We will also analyze a framework we have developed to extract and validate medical terms. In particular we present two uses cases: (i) medical entities extraction of a set of infectious diseases description texts provided by MedlinePlus and (ii) scales of stroke identification in clinical narratives written in Spanish
    • …
    corecore