82 research outputs found

    Applications of deep convolutional neural networks to digitized natural history collections

    Get PDF
    Natural history collections contain data that are critical for many scientific endeavors. Recent efforts in mass digitization are generating large datasets from these collections that can provide unprecedented insight. Here, we present examples of how deep convolutional neural networks can be applied in analyses of imaged herbarium specimens. We first demonstrate that a convolutional neural network can detect mercury-stained specimens across a collection with 90% accuracy. We then show that such a network can correctly distinguish two morphologically similar plant families 96% of the time. Discarding the most challenging specimen images increases accuracy to 94% and 99%, respectively. These results highlight the importance of mass digitization and deep learning approaches and reveal how they can together deliver powerful new investigative tools

    Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave

    Get PDF
    Background:The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations. Objective:This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database. Methods:At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as “Declined” were grouped with “Refused,” and “Multiple Race” was grouped with “Two or more races” and “Multiracial.” Results:“No matching concept” was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category. Conclusions:Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy

    Issues with variability in electronic health record data about race and ethnicity: Descriptive analysis of the National COVID Cohort Collaborative Data Enclave

    Get PDF
    BACKGROUND: The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations. OBJECTIVE: This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database. METHODS: At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as Declined were grouped with Refused, and Multiple Race was grouped with Two or more races and Multiracial. RESULTS: No matching concept was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category. CONCLUSIONS: Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy

    Surgical preferences of patients at risk of hip fractures: hemiarthroplasty versus total hip arthroplasty

    Get PDF
    BACKGROUND: The optimal treatment of displaced femoral neck fractures in patients over 60 years is controversial. While much research has focused on the impact of total hip arthroplasty (THA) and hemiarthroplasty (HA) on surgical outcomes, little is known about patient preferences for either alternative. The purpose of this study was to elicit surgical preferences of patients at risk of sustaining hip fracture using a novel decision board. METHODS: We developed a decision board for the surgical management of displaced femoral neck fractures presenting risks and outcomes of HA and THA. The decision board was presented to 81 elderly patients at risk for developing femoral neck fractures identified from an osteoporosis clinic. The participants were faced with the scenario of sustaining a displaced femoral neck fracture and were asked to state their treatment option preference and rationale for operative procedure. RESULTS: Eighty-five percent (85%) of participants were between the age of 60 and 80 years; 89% were female; 88% were Caucasian; and 49% had some post-secondary education. Ninety-three percent (93%; 95% confidence interval [CI], 87-99%) of participants chose THA as their preferred operative choice. Participants identified several factors important to their decision, including the perception of greater walking distance (63%), less residual pain (29%), less reoperative risk (28%) and lower mortality risk (20%) with THA. Participants who preferred HA (7%; 95% CI, 1-13%) did so for perceived less invasiveness (50%), lower dislocation risk (33%), lower infection risk (33%), and shorter operative time (17%). CONCLUSION: The overwhelming majority of patients preferred THA to HA for the treatment of a displaced femoral neck fracture when confronted with risks and outcomes of both procedures on a decision board

    Use of Electronic Health Records to Support a Public Health Response to the COVID-19 Pandemic in the United States: A Perspective from Fifteen Academic Medical Centers

    Get PDF
    Our goal is to summarize the collective experience of 15 organizations in dealing with uncoordinated efforts that result in unnecessary delays in understanding, predicting, preparing for, containing, and mitigating the COVID-19 pandemic in the US. Response efforts involve the collection and analysis of data corresponding to healthcare organizations, public health departments, socioeconomic indicators, as well as additional signals collected directly from individuals and communities. We focused on electronic health record (EHR) data, since EHRs can be leveraged and scaled to improve clinical care, research, and to inform public health decision-making. We outline the current challenges in the data ecosystem and the technology infrastructure that are relevant to COVID-19, as witnessed in our 15 institutions. The infrastructure includes registries and clinical data networks to support population-level analyses. We propose a specific set of strategic next steps to increase interoperability, overall organization, and efficiencie

    Renal artery sympathetic denervation:observations from the UK experience

    Get PDF
    Background: Renal denervation (RDN) may lower blood pressure (BP); however, it is unclear whether medication changes may be confounding results. Furthermore, limited data exist on pattern of ambulatory blood pressure (ABP) response—particularly in those prescribed aldosterone antagonists at the time of RDN. Methods: We examined all patients treated with RDN for treatment-resistant hypertension in 18 UK centres. Results: Results from 253 patients treated with five technologies are shown. Pre-procedural mean office BP (OBP) was 185/102 mmHg (SD 26/19; n = 253) and mean daytime ABP was 170/98 mmHg (SD 22/16; n = 186). Median number of antihypertensive drugs was 5.0: 96 % ACEi/ARB; 86 % thiazide/loop diuretic and 55 % aldosterone antagonist. OBP, available in 90 % at 11 months follow-up, was 163/93 mmHg (reduction of 22/9 mmHg). ABP, available in 70 % at 8.5 months follow-up, was 158/91 mmHg (fall of 12/7 mmHg). Mean drug changes post RDN were: 0.36 drugs added, 0.91 withdrawn. Dose changes appeared neutral. Quartile analysis by starting ABP showed mean reductions in systolic ABP after RDN of: 0.4; 6.5; 14.5 and 22.1 mmHg, respectively (p < 0.001 for trend). Use of aldosterone antagonist did not predict response (p < 0.2). Conclusion: In 253 patients treated with RDN, office BP fell by 22/9 mmHg. Ambulatory BP fell by 12/7 mmHg, though little response was seen in the lowermost quartile of starting blood pressure. Fall in BP was not explained by medication changes and aldosterone antagonist use did not affect response

    HIV Promoter Integration Site Primarily Modulates Transcriptional Burst Size Rather Than Frequency

    Get PDF
    Mammalian gene expression patterns, and their variability across populations of cells, are regulated by factors specific to each gene in concert with its surrounding cellular and genomic environment. Lentiviruses such as HIV integrate their genomes into semi-random genomic locations in the cells they infect, and the resulting viral gene expression provides a natural system to dissect the contributions of genomic environment to transcriptional regulation. Previously, we showed that expression heterogeneity and its modulation by specific host factors at HIV integration sites are key determinants of infected-cell fate and a possible source of latent infections. Here, we assess the integration context dependence of expression heterogeneity from diverse single integrations of a HIV-promoter/GFP-reporter cassette in Jurkat T-cells. Systematically fitting a stochastic model of gene expression to our data reveals an underlying transcriptional dynamic, by which multiple transcripts are produced during short, infrequent bursts, that quantitatively accounts for the wide, highly skewed protein expression distributions observed in each of our clonal cell populations. Interestingly, we find that the size of transcriptional bursts is the primary systematic covariate over integration sites, varying from a few to tens of transcripts across integration sites, and correlating well with mean expression. In contrast, burst frequencies are scattered about a typical value of several per cell-division time and demonstrate little correlation with the clonal means. This pattern of modulation generates consistently noisy distributions over the sampled integration positions, with large expression variability relative to the mean maintained even for the most productive integrations, and could contribute to specifying heterogeneous, integration-site-dependent viral production patterns in HIV-infected cells. Genomic environment thus emerges as a significant control parameter for gene expression variation that may contribute to structuring mammalian genomes, as well as be exploited for survival by integrating viruses
    corecore