290 research outputs found

    Health record hiccups—5,526 real-world time series with change points labelled by crowdsourced visual inspection

    Get PDF
    Background: Large routinely collected data such as electronic health records (EHRs) are increasingly used in research, but the statistical methods and processes used to check such data for temporal data quality issues have not moved beyond manual, ad hoc production and visual inspection of graphs. With the prospect of EHR data being used for disease surveillance via automated pipelines and public-facing dashboards, automation of data quality checks will become increasingly valuable. Findings: We generated 5,526 time series from 8 different EHR datasets and engaged >2,000 citizen-science volunteers to label the locations of all suspicious-looking change points in the resulting graphs. Consensus labels were produced using density-based clustering with noise, with validation conducted using 956 images containing labels produced by an experienced data scientist. Parameter tuning was done against 670 images and performance calculated against 286 images, resulting in a final sensitivity of 80.4% (95% CI, 77.1%–83.3%), specificity of 99.8% (99.7%–99.8%), positive predictive value of 84.5% (81.4%–87.2%), and negative predictive value of 99.7% (99.6%–99.7%). In total, 12,745 change points were found within 3,687 of the time series. Conclusions: This large collection of labelled EHR time series can be used to validate automated methods for change point detection in real-world settings, encouraging the development of methods that can successfully be applied in practice. It is particularly valuable since change point detection methods are typically validated using synthetic data, so their performance in real-world settings cannot be assumed to be comparable. While the dataset focusses on EHRs and data quality, it should also be applicable in other fields

    Health record hiccups—5,526 real-world time series with change points labelled by crowdsourced visual inspection

    Get PDF
    Background: Large routinely collected data such as electronic health records (EHRs) are increasingly used in research, but the statistical methods and processes used to check such data for temporal data quality issues have not moved beyond manual, ad hoc production and visual inspection of graphs. With the prospect of EHR data being used for disease surveillance via automated pipelines and public-facing dashboards, automation of data quality checks will become increasingly valuable. / Findings: We generated 5,526 time series from 8 different EHR datasets and engaged >2,000 citizen-science volunteers to label the locations of all suspicious-looking change points in the resulting graphs. Consensus labels were produced using density-based clustering with noise, with validation conducted using 956 images containing labels produced by an experienced data scientist. Parameter tuning was done against 670 images and performance calculated against 286 images, resulting in a final sensitivity of 80.4% (95% CI, 77.1%–83.3%), specificity of 99.8% (99.7%–99.8%), positive predictive value of 84.5% (81.4%–87.2%), and negative predictive value of 99.7% (99.6%–99.7%). In total, 12,745 change points were found within 3,687 of the time series. / Conclusions: This large collection of labelled EHR time series can be used to validate automated methods for change point detection in real-world settings, encouraging the development of methods that can successfully be applied in practice. It is particularly valuable since change point detection methods are typically validated using synthetic data, so their performance in real-world settings cannot be assumed to be comparable. While the dataset focusses on EHRs and data quality, it should also be applicable in other fields

    Evaluation of methods for detecting human reads in microbial sequencing datasets

    Get PDF
    Sequencing data from host-associated microbes can often be contaminated by the body of the investigator or research subject. Human DNA is typically removed from microbial reads either by subtractive alignment (dropping all reads that map to the human genome) or by using a read classification tool to predict those of human origin, and then discarding them. To inform best practice guidelines, we benchmarked eight alignment-based and two classification-based methods of human read detection using simulated data from 10 clinically prevalent bacteria and three viruses, into which contaminating human reads had been added. While the majority of methods successfully detected >99 % of the human reads, they were distinguishable by variance. The most precise methods, with negligible variance, were Bowtie2 and SNAP, both of which misidentified few, if any, bacterial reads (and no viral reads) as human. While correctly detecting a similar number of human reads, methods based on taxonomic classification, such as Kraken2 and Centrifuge, could misclassify bacterial reads as human, although the extent of this was species-specific. Among the most sensitive methods of human read detection was BWA, although this also made the greatest number of false positive classifications. Across all methods, the set of human reads not identified as such, although often representing 300 bp) bacterial reads, the highest performing approaches were classification-based, using Kraken2 or Centrifuge. For shorter (c. 150 bp) bacterial reads, combining multiple methods of human read detection maximized the recovery of human reads from contaminated short read datasets without being compromised by false positives. A particularly high-performance approach with shorter bacterial reads was a two-stage classification using Bowtie2 followed by SNAP. Using this approach, we re-examined 11 577 publicly archived bacterial read sets for hitherto undetected human contamination. We were able to extract a sufficient number of reads to call known human SNPs, including those with clinical significance, in 6 % of the samples. These results show that phenotypically distinct human sequence is detectable in publicly archived microbial read datasets

    Short-term genome stability of serial Clostridium difficile ribotype 027 isolates in an experimental gut model and recurrent human disease

    Get PDF
    Copyright: © 2013 Eyre et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedClostridium difficile whole genome sequencing has the potential to identify related isolates, even among otherwise indistinguishable strains, but interpretation depends on understanding genomic variation within isolates and individuals.Serial isolates from two scenarios were whole genome sequenced. Firstly, 62 isolates from 29 timepoints from three in vitro gut models, inoculated with a NAP1/027 strain. Secondly, 122 isolates from 44 patients (2–8 samples/patient) with mostly recurrent/on-going symptomatic NAP-1/027 C. difficile infection. Reference-based mapping was used to identify single nucleotide variants (SNVs).Across three gut model inductions, two with antibiotic treatment, total 137 days, only two new SNVs became established. Pre-existing minority SNVs became dominant in two models. Several SNVs were detected, only present in the minority of colonies at one/two timepoints. The median (inter-quartile range) [range] time between patients’ first and last samples was 60 (29.5–118.5) [0–561] days. Within-patient C. difficile evolution was 0.45 SNVs/called genome/year (95%CI 0.00–1.28) and within-host diversity was 0.28 SNVs/called genome (0.05–0.53). 26/28 gut model and patient SNVs were non-synonymous, affecting a range of gene targets.The consistency of whole genome sequencing data from gut model C. difficile isolates, and the high stability of genomic sequences in isolates from patients, supports the use of whole genome sequencing in detailed transmission investigations.Peer reviewe

    The quality of vital signs measurements and value preferences in electronic medical records varies by hospital, specialty, and patient demographics

    Get PDF
    We aimed to assess the frequency of value preferences in recording of vital signs in electronic healthcare records (EHRs) and associated patient and hospital factors. We used EHR data from Oxford University Hospitals, UK, between 01-January-2016 and 30-June-2019 and a maximum likelihood estimator to determine the prevalence of value preferences in measurements of systolic and diastolic blood pressure (SBP/DBP), heart rate (HR) (readings ending in zero), respiratory rate (multiples of 2 or 4), and temperature (readings of 36.0 °C). We used multivariable logistic regression to investigate associations between value preferences and patient age, sex, ethnicity, deprivation, comorbidities, calendar time, hour of day, days into admission, hospital, day of week and speciality. In 4,375,654 records from 135,173 patients, there was an excess of temperature readings of 36.0 °C above that expected from the underlying distribution that affected 11.3% (95% CI 10.6–12.1%) of measurements, i.e. these observations were likely inappropriately recorded as 36.0 °C instead of the true value. SBP, DBP and HR were rounded to the nearest 10 in 2.2% (1.4–2.8%) and 2.0% (1.3–5.1%) and 2.4% (1.7–3.1%) of measurements. RR was also more commonly recorded as multiples of 2. BP digit preference and an excess of temperature recordings of 36.0 °C were more common in older and male patients, as length of stay increased, following a previous normal set of vital signs and typically more common in medical vs. surgical specialities. Differences were seen between hospitals, however, digit preference reduced over calendar time. Vital signs may not always be accurately documented, and this may vary by patient groups and hospital settings. Allowances and adjustments may be needed in delivering care to patients and in observational analyses and predictive tools using these factors as outcomes or exposures

    Hybrid Vibrio vulnificus

    Get PDF
    Hybridization between natural populations of Vibrio vulnificus results in hyperinvasive clone

    Effect of COVID-19 vaccination on transmission of Alpha and Delta variants

    Get PDF
    BACKGROUND: Before the emergence of the B.1.617.2 (delta) variant of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), vaccination reduced transmission of SARS-CoV-2 from vaccinated persons who became infected, potentially by reducing viral loads. Although vaccination still lowers the risk of infection, similar viral loads in vaccinated and unvaccinated persons who are infected with the delta variant call into question the degree to which vaccination prevents transmission. METHODS: We used contact-testing data from England to perform a retrospective observational cohort study involving adult contacts of SARS-CoV-2–infected adult index patients. We used multivariable Poisson regression to investigate associations between transmission and the vaccination status of index patients and contacts and to determine how these associations varied with the B.1.1.7 (alpha) and delta variants and time since the second vaccination. RESULTS: Among 146,243 tested contacts of 108,498 index patients, 54,667 (37%) had positive SARS-CoV-2 polymerase-chain-reaction (PCR) tests. In index patients who became infected with the alpha variant, two vaccinations with either BNT162b2 or ChAdOx1 nCoV-19 (also known as AZD1222), as compared with no vaccination, were independently associated with reduced PCR positivity in contacts (adjusted rate ratio with BNT162b2, 0.32; 95% confidence interval [CI], 0.21 to 0.48; and with ChAdOx1 nCoV-19, 0.48; 95% CI, 0.30 to 0.78). Vaccine-associated reductions in transmission of the delta variant were smaller than those with the alpha variant, and reductions in transmission of the delta variant after two BNT162b2 vaccinations were greater (adjusted rate ratio for the comparison with no vaccination, 0.50; 95% CI, 0.39 to 0.65) than after two ChAdOx1 nCoV-19 vaccinations (adjusted rate ratio, 0.76; 95% CI, 0.70 to 0.82). Variation in cycle-threshold (Ct) values (indicative of viral load) in index patients explained 7 to 23% of vaccine-associated reductions in transmission of the two variants. The reductions in transmission of the delta variant declined over time after the second vaccination, reaching levels that were similar to those in unvaccinated persons by 12 weeks in index patients who had received ChAdOx1 nCoV-19 and attenuating substantially in those who had received BNT162b2. Protection in contacts also declined in the 3-month period after the second vaccination. CONCLUSIONS: Vaccination was associated with a smaller reduction in transmission of the delta variant than of the alpha variant, and the effects of vaccination decreased over time. PCR Ct values at diagnosis of the index patient only partially explained decreased transmission. (Funded by the U.K. Government Department of Health and Social Care and others.

    Mortality risks associated with empirical antibiotic activity in E. coli bacteraemia: an analysis of electronic health records

    Get PDF
    Background: Reported bacteraemia outcomes following inactive empirical antibiotics (based on in vitro testing) are conflicting, potentially reflecting heterogeneity in causative species, MIC breakpoints defining resistance/susceptibility, and times to rescue therapy. Methods: We investigated adult inpatients with Escherichia coli bacteraemia at Oxford University Hospitals, UK, from 4 February 2014 to 30 June 2021 who were receiving empirical amoxicillin/clavulanate with/without other antibiotics. We used Cox regression to analyse 30 day all-cause mortality by in vitro amoxicillin/clavulanate susceptibility (activity) using the EUCAST resistance breakpoint (>8/2 mg/L), categorical MIC, and a higher resistance breakpoint (>32/2 mg/L), adjusting for other antibiotic activity and confounders including comorbidities, vital signs and blood tests. Results: A total of 1720 E. coli bacteraemias (1626 patients) were treated with empirical amoxicillin/clavulanate. Thirty-day mortality was 193/1400 (14%) for any active baseline therapy and 52/320 (16%) for inactive baseline therapy (P = 0.17). With EUCAST breakpoints, there was no evidence that mortality differed for inactive versus active amoxicillin/clavulanate [adjusted HR (aHR) = 1.27 (95% CI 0.83–1.93); P = 0.28], nor of an association with active aminoglycoside (P = 0.93) or other active antibiotics (P = 0.18). Considering categorical amoxicillin/clavulanate MIC, MICs > 32/2 mg/L were associated with mortality [aHR = 1.85 versus MIC = 2/2 mg/L (95% CI 0.99–3.73); P = 0.054]. A higher resistance breakpoint (>32/2 mg/L) was independently associated with higher mortality [aHR = 1.82 (95% CI 1.07–3.10); P = 0.027], as were MICs > 32/2 mg/L with active empirical aminoglycosides [aHR = 2.34 (95% CI 1.40–3.89); P = 0.001], but not MICs > 32/2 mg/L with active non-aminoglycoside antibiotic(s) [aHR = 0.87 (95% CI 0.40–1.89); P = 0.72]. Conclusions: We found no evidence that EUCAST-defined amoxicillin/clavulanate resistance was associated with increased mortality, but a higher resistance breakpoint (MIC > 32/2 mg/L) was. Additional active baseline non-aminoglycoside antibiotics attenuated amoxicillin/clavulanate resistance-associated mortality, but aminoglycosides did not. Granular phenotyping and comparison with clinical outcomes may improve AMR breakpoints

    Distinct patterns of vital sign and inflammatory marker responses in adults with suspected bloodstream infection

    Get PDF
    Objectives: To identify patterns in inflammatory marker and vital sign responses in adult with suspected bloodstream infection (BSI) and define expected trends in normal recovery. Methods: We included patients ≥16 y from Oxford University Hospitals with a blood culture taken between 1-January-2016 and 28-June-2021. We used linear and latent class mixed models to estimate trajectories in C-reactive protein (CRP), white blood count, heart rate, respiratory rate and temperature and identify CRP response subgroups. Centile charts for expected CRP responses were constructed via the lambda-mu-sigma method. Results: In 88,348 suspected BSI episodes; 6908 (7.8%) were culture-positive with a probable pathogen, 4309 (4.9%) contained potential contaminants, and 77,131(87.3%) were culture-negative. CRP levels generally peaked 1–2 days after blood culture collection, with varying responses for different pathogens and infection sources (p < 0.0001). We identified five CRP trajectory subgroups: peak on day 1 (36,091; 46.3%) or 2 (4529; 5.8%), slow recovery (10,666; 13.7%), peak on day 6 (743; 1.0%), and low response (25,928; 33.3%). Centile reference charts tracking normal responses were constructed from those peaking on day 1/2. Conclusions: CRP and other infection response markers rise and recover differently depending on clinical syndrome and pathogen involved. However, centile reference charts, that account for these differences, can be used to track if patients are recovering as expected and to help personalise infection treatment
    • …
    corecore