80 research outputs found

    SUDO: a framework for evaluating clinical artificial intelligence systems without ground-truth annotations

    Full text link
    A clinical artificial intelligence (AI) system is often validated on a held-out set of data which it has not been exposed to before (e.g., data from a different hospital with a distinct electronic health record system). This evaluation process is meant to mimic the deployment of an AI system on data in the wild; those which are currently unseen by the system yet are expected to be encountered in a clinical setting. However, when data in the wild differ from the held-out set of data, a phenomenon referred to as distribution shift, and lack ground-truth annotations, it becomes unclear the extent to which AI-based findings can be trusted on data in the wild. Here, we introduce SUDO, a framework for evaluating AI systems without ground-truth annotations. SUDO assigns temporary labels to data points in the wild and directly uses them to train distinct models, with the highest performing model indicative of the most likely label. Through experiments with AI systems developed for dermatology images, histopathology patches, and clinical reports, we show that SUDO can be a reliable proxy for model performance and thus identify unreliable predictions. We also demonstrate that SUDO informs the selection of models and allows for the previously out-of-reach assessment of algorithmic bias for data in the wild without ground-truth annotations. The ability to triage unreliable predictions for further inspection and assess the algorithmic bias of AI systems can improve the integrity of research findings and contribute to the deployment of ethical AI systems in medicine

    Performances of Different Global Positioning System Devices for Time-Location Tracking in Air Pollution Epidemiological Studies

    Get PDF
    Background People's time-location patterns are important in air pollution exposure assessment because pollution levels may vary considerably by location. A growing number of studies are using global positioning systems (GPS) to track people's time-location patterns. Many portable GPS units that archive location are commercially available at a cost that makes their use feasible for epidemiological studies. Methods We evaluated the performance of five portable GPS data loggers and two GPS cell phones by examining positional accuracy in typical locations (indoor, outdoor, in-vehicle) and factors that influence satellite reception (building material, building type), acquisition time (cold and warm start), battery life, and adequacy of memory for data storage. We examined stationary locations (eg, indoor, outdoor) and mobile environments (eg, walking, traveling by vehicle or bus) and compared GPS locations to highly-resolved US Geological Survey (USGS) and Digital Orthophoto Quarter Quadrangle (DOQQ) maps. Results The battery life of our tested instruments ranged from <9 hours to 48 hours. The acquisition of location time after startup ranged from a few seconds to >20 minutes and varied significantly by building structure type and by cold or warm start. No GPS device was found to have consistently superior performance with regard to spatial accuracy and signal loss. At fixed outdoor locations, 65%-95% of GPS points fell within 20-m of the corresponding DOQQ locations for all the devices. At fixed indoor locations, 50%-80% of GPS points fell within 20-m of the corresponding DOQQ locations for all the devices except one. Most of the GPS devices performed well during commuting on a freeway, with >80% of points within 10-m of the DOQQ route, but the performance was significantly impacted by surrounding structures on surface streets in highly urbanized areas. Conclusions All the tested GPS devices had limitations, but we identified several devices which showed promising performance for tracking subjects’ time location patterns in epidemiological studies

    Spatial disparity in the distribution of superfund sites in South Carolina: an ecological study

    Get PDF
    BACKGROUND: According to the US Environmental Protection Agency (EPA), Superfund is a federal government program implemented to clean up uncontrolled hazardous waste sites. Twenty-six sites in South Carolina (SC) have been included on the National Priorities List (NPL), which has serious human health and environmental implications. The purpose of this study was to assess spatial disparities in the distribution of Superfund sites in SC. METHODS: The 2000 US census tract and block level data were used to generate population characteristics, which included race/ethnicity, socioeconomic status (SES), education, home ownership, and home built before 1950. Geographic Information Systems (GIS) were used to map Superfund facilities and develop choropleth maps based on the aforementioned sociodemographic variables. Spatial methods, including mean and median distance analysis, buffer analysis, and spatial approximation were employed to characterize burden disparities. Regression analysis was performed to assess the relationship between the number of Superfund facilities and population characteristics. RESULTS: Spatial coincidence results showed that of the 29.5% of Blacks living in SC, 55.9% live in Superfund host census tracts. Among all populations in SC living below poverty (14.2%), 57.2% were located in Superfund host census tracts. Buffer analyses results (0.5mi, 1.0mi, 5.0mi, 0.5km, 1.0km, and 5.0km) showed a higher percentage of Whites compared to Blacks hosting a Superfund facility. Conversely, a slightly higher percentage of Blacks hosted (30.2%) a Superfund facility than those not hosting (28.8%) while their White counterparts had more equivalent values (66.7% and 67.8%, respectively). Regression analyses in the reduced model (Adj. R(2) = 0.038) only explained a small percentage of the variance. In addition, the mean distance for percent of Blacks in the 90th percentile for Superfund facilities was 0.48mi. CONCLUSION: Burden disparities exist in the distribution of Superfund facilities in SC at the block and census tract levels across varying levels of demographic composition for race/ethnicity and SES

    AD-BERT: Using Pre-trained contextualized embeddings to Predict the Progression from Mild Cognitive Impairment to Alzheimer's Disease

    Full text link
    Objective: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). Materials and Methods: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000-2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting, and then pretrained a BERT model for AD (AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. The embeddings of all the sections of a patient's notes processed by AD-BERT were combined by MaxPooling to compute the probability of MCI-to-AD progression. For replication, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. Results: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.8170 and F1 score of 0.4178 on NMEDW dataset and AUC of 0.8830 and F1 score of 0.6836 on WCM dataset. Conclusion: We developed a deep learning framework using BERT models which provide an effective solution for prediction of MCI-to-AD progression using clinical note analysis

    Travel patterns during pregnancy: comparison between Global Positioning System (GPS) tracking and questionnaire data

    Get PDF
    Maternal exposures to traffic-related air pollution have been associated with adverse pregnancy outcomes. Exposures to traffic-related air pollutants are strongly influenced by time spent near traffic. However, little is known about women’s travel activities during pregnancy and whether questionnaire-based data can provide reliable information on travel patterns during pregnancy. Examine women’s in-vehicle travel behavior during pregnancy and examine the difference in travel data collected by questionnaire and global positioning system (GPS) and their potential for exposure error. We measured work-related travel patterns in 56 pregnant women using a questionnaire and one-week GPS tracking three times during pregnancy (30 weeks of gestation). We compared self-reported activities with GPS-derived trip distance and duration, and examined potentially influential factors that may contribute to differences. We also described in-vehicle travel behavior by pregnancy periods and influences of demographic and personal factors on daily travel times. Finally, we estimated personal exposure to particle-bound polycyclic aromatic hydrocarbon (PB-PAH) and examined the magnitude of exposure misclassification using self-reported vs. GPS travel data.Subjects overestimated both trip duration and trip distance compared to the GPS data. We observed moderately high correlations between self-reported and GPS-recorded travel distance (home to work trips: r = 0.88; work to home trips: r = 0.80). Better agreement was observed between the GPS and the self-reported travel time for home to work trips (r = 0.77) than work to home trips (r = 0.64). The subjects on average spent 69 and 93 minutes traveling in vehicles daily based on the GPS and self-reported data, respectively. Longer daily travel time was observed among participants in early pregnancy, and during certain pregnancy periods in women with higher education attainment, higher income, and no children. When comparing self-reported vs. GPS data, we found that estimated personal exposure to PB-PAH did not differ remarkably at the population level, but the difference was large at an individual level. Self-reported home-to-work data overestimated both trip duration and trip distance compared to GPS data. Significant differences in PAH exposure estimates were observed at individual level using self-reported vs. GPS data, which has important implications in air pollution epidemiological studies.https://doi.org/10.1186/1476-069X-12-8

    Correlation between preconception maternal non-occupational exposure to interior decoration or oil paint odour and average birth weight of neonates: findings from a nationwide cohort study in China\u27s rural areas

    Get PDF
    BACKGROUND: Birth weight is a critical indicator of neonatal health and foretells people\u27s health in adolescence and even adulthood. Some researchers have warned against the adverse effects on babies\u27 birth weight of exposure to pollutants in interior decoration or oil paint by odour intake. This study evaluated the effects of maternal exposure to such factors before conception on the birth weights of neonates. METHODS: Data on 213 461 cases in this study were from the database of the free National Pre-pregnancy Checkups Project. Defined as \u27exposed\u27 were those women exposed to oil paint odour or interior decoration at home or in the workplace within 6 months before their pregnancy. The study focused on revealing the correlation between such exposure and the birth weight of the neonates of these women, especially the incidence of macrosomia and low birth weight (LBW). Statistical analysis was conducted using the Kruskal-Wallis H test, the Mann-Whitney U test and logistic regression. RESULTS: The birth weight of babies from mothers non-occupationally exposed to such settings averaged 3465 g (range 3150-3650 g), whereas the birth weight of those from mothers free of such exposure averaged 3300 g (range 3000-3600g). Maternal exposure preconception to interior decoration or oil paint odour reduced the incidence of LBW in their babies (p=0.003, OR 0.749, 95% CI 0.617 to 0.909). Such exposure may also augment the probability of macrosomia (p \u3c 0.001, OR 1.297, 95% CI 1.133 to 1.484). CONCLUSION: Maternal exposure to interior decoration or oil paint odour preconception may increase the average birth weight of neonates, as well as the incidence of macrosomia

    Being overburdened and medically underserved: assessment of this double disparity for populations in the state of Maryland

    Get PDF
    Environmental justice research has shown that many communities of color and low-income persons are differentially burdened by noxious land uses including Toxic Release Inventory (TRI) facilities. However, limited work has been performed to assess how these populations tend to be both overburdened and medically underserved. We explored this “double disparity” for the first time in Maryland. We assessed spatial disparities in the distribution of TRI facilities in Maryland across varying levels of sociodemographic composition using 2010 US Census Health Professional Shortage Area (HPSA) data. Univariate and multivariate regression in addition to geographic information systems (GIS) were used to examine relationships between sociodemographic measures and location of TRI facilities. Buffer analysis was also used to assess spatial disparities. Four buffer categories included: 1) census tracts hosting one or more TRI facilities; 2) tracts located more than 0 and up to 0.5 km from the closest TRI facility; 3) tracts located more than 0.5 km and up to 1 km from a TRI facility; and 4) tracts located more than 1 km and up to 5 km from a TRI facility. We found that tracts with higher proportions of non-white residents and people living in poverty were more likely to be closer to TRI facilities. A significant increase in income was observed with an increase in distance between a census tract and the closest TRI facility. In general, percent non-white was higher in HPSA tracts that host at least one TRI facility than in non-HPSA tracts that host at least one TRI facility. Additionally, percent poverty, unemployment, less than high school education, and homes built pre-1950 were higher in HPSA tracts hosting TRI facilities than in non-HPSA tracts hosting TRI facilities. We found that people of color and low-income groups are differentially burdened by TRI facilities in Maryland. We also found that both low-income groups and persons without a high school education are both overburdened and medically underserved. The results of this study provide insight into how state agencies can better address the double disparity of disproportionate environmental hazards and limited access to health care resources facing vulnerable communities in Maryland.https://doi.org/10.1186/1476-069X-13-2

    Distinct miRNAs associated with various clinical presentations of SARS-CoV-2 infection.

    Get PDF
    MicroRNAs (miRNAs) have been shown to play important roles in viral infections, but their associations with SARS-CoV-2 infection remain poorly understood. Here, we detected 85 differentially expressed miRNAs (DE-miRNAs) from 2,336 known and 361 novel miRNAs that were identified in 233 plasma samples from 61 healthy controls and 116 patients with COVID-19 using the high-throughput sequencing and computational analysis. These DE-miRNAs were associated with SASR-CoV-2 infection, disease severity, and viral persistence in the patients with COVID-19, respectively. Gene ontology and KEGG pathway analyses of the DE-miRNAs revealed their connections to viral infections, immune responses, and lung diseases. Finally, we established a machine learning model using the DE-miRNAs between various groups for classification of COVID-19 cases with different clinical presentations. Our findings may help understand the contribution of miRNAs to the pathogenesis of COVID-19 and identify potential biomarkers and molecular targets for diagnosis and treatment of SARS-CoV-2 infection
    • …
    corecore