19 research outputs found

    Obtaining structured clinical data from unstructured data using natural language processing software

    Get PDF
    ABSTRACT Background Free text documents in healthcare settings contain a wealth of information not captured in electronic healthcare records (EHRs). Epilepsy clinic letters are an example of an unstructured data source containing a large amount of intricate disease information. Extracting meaningful and contextually correct clinical information from free text sources, to enhance EHRs, remains a significant challenge. SCANR (Swansea University Collaborative in the Analysis of NLP Research) was set up to use natural language processing (NLP) technology to extract structured data from unstructured sources. IBM Watson Content Analytics software (ICA) uses NLP technology. It enables users to define annotations based on dictionaries and language characteristics to create parsing rules that highlight relevant items. These include clinical details such as symptoms and diagnoses, medication and test results, as well as personal identifiers.   Approach To use ICA to build a pipeline to accurately extract detailed epilepsy information from clinic letters. Methods We used ICA to retrieve important epilepsy information from 41 pseudo-anonymized unstructured epilepsy clinic letters. The 41 letters consisted of 13 ‘new’ and 28 ‘follow-up’ letters (for 15 different patients) written by 12 different doctors in different styles. We designed dictionaries and annotators to enable ICA to extract epilepsy type (focal, generalized or unclassified), epilepsy cause, age of onset, investigation results (EEG, CT and MRI), medication, and clinic date. Epilepsy clinicians assessed the accuracy of the pipeline. Results The accuracy (sensitivity, specificity) of each concept was: epilepsy diagnosis 98% (97%, 100%), focal epilepsy 100%, generalized epilepsy 98% (93%, 100%), medication 95% (93%, 100%), age of onset 100% and clinic date 95% (95%, 100%). Precision and recall for each concept were respectively, 98% and 97% for epilepsy diagnosis, 100% each for focal epilepsy, 100% and 93% for generalized epilepsy, 100% each for age of onset, 100% and 93% for medication, 100% and 96% for EEG results, 100% and 83% for MRI scan results, and 100% and 95% for clinic date. Conclusions ICA is capable of extracting detailed, structured epilepsy information from unstructured clinic letters to a high degree of accuracy. This data can be used to populate relational databases and be linked to EHRs. Researchers can build in custom rules to identify concepts of interest from letters and produce structured information. We plan to extend our work to hundreds and then thousands of clinic letters, to provide phenotypically rich epilepsy data to link with other anonymised, routinely collected data

    Validating epilepsy diagnoses in routinely collected data

    Get PDF
    Purpose: Anonymised, routinely-collected healthcare data is increasingly being used for epilepsy research. We validated algorithms using general practitioner (GP) primary healthcare records to identify people with epilepsy from anonymised healthcare data within the Secure Anonymised Information Linkage (SAIL) databank in Wales, UK. Method: A reference population of 150 people with definite epilepsy and 150 people without epilepsy was ascertained from hospital records and linked to records contained within SAIL (containing GP records for 2.4 million people). We used three different algorithms, using combinations of GP epilepsy diagnosis and anti-epileptic drug (AED) prescription codes, to identify the reference population. Results: Combining diagnosis and AED prescription codes had a sensitivity of 84% (95% ci 77–90) and specificity of 98% (95–100) in identifying people with epilepsy; diagnosis codes alone had a sensitivity of 86% (80–91) and a specificity of 97% (92–99); and AED prescription codes alone achieved a sensitivity of 92% (70–83) and a specificity of 73% (65–80). Using AED codes only was more accurate in children achieving a sensitivity of 88% (75–95) and specificity of 98% (88–100). Conclusion: GP epilepsy diagnosis and AED prescription codes can be confidently used to identify people with epilepsy using anonymised healthcare records in Wales, U

    SARS-CoV-2 infection risk among 77,587 healthcare workers: a national observational longitudinal cohort study in Wales, United Kingdom, April to November 2020

    Get PDF
    Objectives: To better understand the risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection among healthcare workers, leading to recommendations for the prioritisation of personal protective equipment, testing, training and vaccination. Design: Observational, longitudinal, national cohort study. Setting: Our cohort were secondary care (hospital-based) healthcare workers employed by NHS Wales (United Kingdom) organisations from 1 April 2020 to 30 November 2020. Participants: We included 577,756 monthly observations among 77,587 healthcare workers. Using linked anonymised datasets, participants were grouped into 20 staff roles. Additionally, each role was deemed either patient-facing, non-patient-facing or undetermined. This was linked to individual demographic details and dates of positive SARS-CoV-2 PCR tests. Main outcome measures: We used univariable and multivariable logistic regression models to determine odds ratios (ORs) for the risk of a positive SARS-CoV-2 PCR test. Results: Patient-facing healthcare workers were at the highest risk of SARS-CoV-2 infection with an adjusted OR (95% confidence interval [CI]) of 2.28 (95% CI 2.10–2.47). We found that after adjustment, foundation year doctors (OR 1.83 [95% CI 1.47–2.27]), healthcare support workers [OR 1.36 [95% CI 1.20–1.54]) and hospital nurses (OR 1.27 [95% CI 1.12–1.44]) were at the highest risk of infection among all staff groups. Younger healthcare workers and those living in more deprived areas were at a higher risk of infection. We also observed that infection rates varied over time and by organisation. Conclusions: These findings have important policy implications for the prioritisation of vaccination, testing, training and personal protective equipment provision for patient-facing roles and the higher risk staff groups

    Using GWAS top hits to inform priors in Bayesian fine-mapping association studies

    Get PDF
    The default causal single‐nucleotide polymorphism (SNP) effect size prior in Bayesian fine‐mapping studies is usually the Normal distribution. This choice is often based on computational convenience, rather than evidence that it is the most suitable prior distribution. The choice of prior is important because previous studies have shown considerable sensitivity of causal SNP Bayes factors to the form of the prior. In some well‐studied diseases there are now considerable numbers of genome‐wide association study (GWAS) top hits along with estimates of the number of yet‐to‐be‐discovered causal SNPs. We show how the effect sizes of the top hits and estimates of the number of yet‐to‐be‐discovered causal SNPs can be used to choose between the Laplace and Normal priors, to estimate the prior parameters and to quantify the uncertainty in this estimation. The methodology can readily be applied to other priors. We show that the top hits available from breast cancer GWAS provide overwhelming support for the Laplace over the Normal prior, which has important consequences for variant prioritisation. This work in this paper enables practitioners to derive more objective priors than are currently being used and could lead to prioritisation of different variants

    Risk factors for self-harm in people with epilepsy

    Get PDF
    Objective:To estimate the risk of self-harm in people with epilepsy and identify factors which influence this risk.Methods: We identified people with incident epilepsy in the Clinical Practice Research Datalink (CPRD), linked to hospitalization and mortality data, in England (01/01/1998-03/31/2014). In Phase 1, we estimated risk of self-harm among people with epilepsy, versus those without, in a matched cohort study using a stratified-Cox proportional hazards model. In Phase 2, we delineated a nested case-control study from the incident epilepsy cohort. People who had self-harmed (cases) were matched with up to 20 controls. From conditional logistic regression models, we estimated relative risk of self-harm associated with mental and physical illness comorbidity, contact with healthcare services and antiepileptic drug (AED) use.Results: Phase 1 included 11,690 people with epilepsy and 215,569 individuals without. We observed an adjusted hazard ratio of 5.31 (95% CI 4.08-6.89) for self-harm in the first year following epilepsy diagnosis and 3.31 (95% CI 2.85-3.84) in subsequent years. In Phase 2, there were 273 cases and 3,790 controls. Elevated self-harm risk was associated with mental illness (OR 4.08, 95% CI 3.06-5.42), multiple General Practitioner consultations, treatment with two AEDs versus monotherapy (OR 1.84, 95% CI 1.33-2.55) and AED treatment augmentation (OR 2.12, 95% CI 1.38-3.26). Conclusion: People with epilepsy have elevated self-harm risk, especially in the first year following diagnosis. Clinicians should adequately monitor these individuals and be especially vigilant to self-harm risk in people with epilepsy and comorbid mental illness, frequent healthcare service contact, those taking multiple AEDs and during treatment augmentation

    Widespread divergence of the CEACAM/PSG genes in vertebrates and humans suggests sensitivity to selection

    Get PDF
    In mammals, carcinoembryonic antigen cell adhesion molecules (CEACAMs) and pregnancy-specific glycoproteins (PSGs) play important roles in the regulation of pathogen transmission, tumorigenesis, insulin signaling turnover, and fetal–maternal interactions. However, how these genes evolved and to what extent they diverged in humans remain to be investigated specifically. Based on syntenic mapping of chordate genomes, we reveal that diverging homologs with a prototypic CEACAM architecture–including an extracellular domain with immunoglobulin variable and constant domain-like regions, and an intracellular domain containing ITAM motif–are present from cartilaginous fish to humans, but are absent in sea lamprey, cephalochordate or urochordate. Interestingly, the CEACAM/PSG gene inventory underwent radical divergence in various vertebrate lineages: from zero in avian species to dozens in therian mammals. In addition, analyses of genetic variations in human populations showed the presence of various types of copy number variations (CNVs) at the CEACAM/PSG locus. These copy number polymorphisms have 3–80% frequency in select populations, and encompass single to more than six PSG genes. Furthermore, we found that CEACAM/PSG genes contain a significantly higher density of nonsynonymous single nucleotide polymorphism (SNP) compared to the chromosome average, and many CEACAM/PSG SNPs exhibit high population differentiation. Taken together, our study suggested that CEACAM/PSG genes have had a more dynamic evolutionary history in vertebrates than previously thought. Given that CEACAM/PSGs play important roles in maternal–fetal interaction and pathogen recognition, these data have laid the groundwork for future analysis of adaptive CEACAM/PSG genotype-phenotypic relationships in normal and complicated pregnancies as well as other etiologies.Chia Lin Chang, Jenia Semyonov, Po Jen Cheng, Shang Yu Huang, Jae Il Park, Huai-Jen Tsai, Cheng-Yung Lin, Frank GrĂŒtzner, Yung Kuei Soong, James J. Cai, Sheau Yu Teddy Hs

    Trends in the first antiepileptic drug prescribed for epilepsy between 2000 and 2010

    Get PDF
    AbstractPurposeTo investigate changes in the choice of first anti-epileptic drug (AED) and co-prescription of folic acid after a new diagnosis of epilepsy.MethodsWe searched anonymised electronic primary care records dating between 2000 and 2010 for patients with a new diagnosis of epilepsy and recorded the first AED prescribed and whether folic acid was co-prescribed.ResultsFrom 13.3 million patient years of primary care records, we identified 3714 patients with a new diagnosis of epilepsy (925 children and 649 women aged 14–45 years). Comparing first time AED prescriptions in 2000 and 2001 to those in 2009 and 2010 showed a significant decrease in the proportion of carbamazepine and phenytoin prescribed and a significant increase in the proportion of lamotrigine and levetiracetam prescribed. In women aged 14–45 years, and girls aged <18 there was a significant decrease in the proportion of sodium valproate prescribed. Women aged 14–45 years were significantly more likely to be co-prescribed folic acid with their first AED compared to all other patients (20% vs 3%, p-value<0.001). The proportion of folic acid co-prescribed with the first AED did not change significantly between 2000 and 2010.ConclusionThe changing trends in the first AED prescribed over the last decade, particularly in women of childbearing age, reflect published evidence in terms of AED efficacy, tolerability and safety
    corecore