357 research outputs found

    Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit

    Get PDF
    Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ∼8.8B words from ∼17M clinical records and further fine-tuning with ∼6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases

    Characterization of patients with idiopathic normal pressure hydrocephalus using natural language processing within an electronic healthcare record system

    Get PDF
    OBJECTIVE: Idiopathic normal pressure hydrocephalus (iNPH) is an underdiagnosed, progressive, and disabling condition. Early treatment is associated with better outcomes and improved quality of life. In this paper, the authors aimed to identify features associated with patients with iNPH using natural language processing (NLP) to characterize this cohort, with the intention to later target the development of artificial intelligence–driven tools for early detection. / METHODS: The electronic health records of patients with shunt-responsive iNPH were retrospectively reviewed using an NLP algorithm. Participants were selected from a prospectively maintained single-center database of patients undergoing CSF diversion for probable iNPH (March 2008–July 2020). Analysis was conducted on preoperative health records including clinic letters, referrals, and radiology reports accessed through CogStack. Clinical features were extracted from these records as SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) concepts using a named entity recognition machine learning model. In the first phase, a base model was generated using unsupervised training on 1 million electronic health records and supervised training with 500 double-annotated documents. The model was fine-tuned to improve accuracy using 300 records from patients with iNPH double annotated by two blinded assessors. Thematic analysis of the concepts identified by the machine learning algorithm was performed, and the frequency and timing of terms were analyzed to describe this patient group. / RESULTS: In total, 293 eligible patients responsive to CSF diversion were identified. The median age at CSF diversion was 75 years, with a male predominance (69% male). The algorithm performed with a high degree of precision and recall (F1 score 0.92). Thematic analysis revealed the most frequently documented symptoms related to mobility, cognitive impairment, and falls or balance. The most frequent comorbidities were related to cardiovascular and hematological problems. / CONCLUSIONS: This model demonstrates accurate, automated recognition of iNPH features from medical records. Opportunities for translation include detecting patients with undiagnosed iNPH from primary care records, with the aim to ultimately improve outcomes for these patients through artificial intelligence–driven early detection of iNPH and prompt treatment

    Ontology-Based Clinical Information Extraction Using SNOMED CT

    Get PDF
    Extracting and encoding clinical information captured in unstructured clinical documents with standard medical terminologies is vital to enable secondary use of clinical data from practice. SNOMED CT is the most comprehensive medical ontology with broad types of concepts and detailed relationships and it has been widely used for many clinical applications. However, few studies have investigated the use of SNOMED CT in clinical information extraction. In this dissertation research, we developed a fine-grained information model based on the SNOMED CT and built novel information extraction systems to recognize clinical entities and identify their relations, as well as to encode them to SNOMED CT concepts. Our evaluation shows that such ontology-based information extraction systems using SNOMED CT could achieve state-of-the-art performance, indicating its potential in clinical natural language processing

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    A Dimensional Modeling Approach to Internet-Delivered Psychological Treatments

    Get PDF
    Mental health problems are becoming an increasingly significant public health concern on a global scale. While effective psychological treatments exist, they scale poorly to the number of people who require help, meaning many continue to suffer due to a lack of care. Internet-Delivered Psychological Treatments (IDPT) have emerged as an innovative alternative to traditional treatments that aims to be more scalable, cost-effective, and accessible. Although the use of IDPT has yielded promising results, it is also associated with a number of challenges. One challenge is preventing patient dropout, leading to an interest in adaptive IDPT that focuses on personalizing the treatment based on user needs. Furthermore, IDPT systems may generate large amounts of complex user data that must be structured sensibly in order to facilitate data analysis. In this thesis, we demonstrate a dimensional modeling approach to organizing the components of IDPT. We focus on two use cases for this approach, namely (1) facilitating reuse of treatment materials and (2) adapting treatment to user needs. Using the design science methodology, we have developed a dimensional model for IDPT as our artifact. In addition, we discuss the implementation of the dimensional modeling approach in IDPT systems and related challenges. The artifact was primarily evaluated through a semi-structured interview with domain experts of psychology. Based on this, we found that the artifact represents a suitable starting point for future research within this topic.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO

    Characterization of Time-variant and Time-invariant Assessment of Suicidality on Reddit using C-SSRS

    Get PDF
    Suicide is the 10th leading cause of death in the U.S (1999-2019). However, predicting when someone will attempt suicide has been nearly impossible. In the modern world, many individuals suffering from mental illness seek emotional support and advice on well-known and easily-accessible social media platforms such as Reddit. While prior artificial intelligence research has demonstrated the ability to extract valuable information from social media on suicidal thoughts and behaviors, these efforts have not considered both severity and temporality of risk. The insights made possible by access to such data have enormous clinical potential - most dramatically envisioned as a trigger to employ timely and targeted interventions (i.e., voluntary and involuntary psychiatric hospitalization) to save lives. In this work, we address this knowledge gap by developing deep learning algorithms to assess suicide risk in terms of severity and temporality from Reddit data based on the Columbia Suicide Severity Rating Scale (C-SSRS). In particular, we employ two deep learning approaches: time-variant and time-invariant modeling, for user-level suicide risk assessment, and evaluate their performance against a clinician-adjudicated gold standard Reddit corpus annotated based on the C-SSRS. Our results suggest that the time-variant approach outperforms the time-invariant method in the assessment of suicide-related ideations and supportive behaviors (AUC:0.78), while the time-invariant model performed better in predicting suicide-related behaviors and suicide attempt (AUC:0.64). The proposed approach can be integrated with clinical diagnostic interviews for improving suicide risk assessments.Comment: 24 Pages, 8 Tables, 6 Figures; Accepted by PLoS One ; One of the two mentioned Datasets in the manuscript has Closed Access. We will make it public after PLoS One produces the manuscrip
    • …
    corecore