10 research outputs found

    An interpretable machine learning framework for measuring urban perceptions from panoramic street view images

    Get PDF
    The proliferation of street view images (SVIs) and the constant advancements in deep learning techniques have enabled urban analysts to extract and evaluate urban perceptions from large-scale urban streetscapes. However, many existing analytical frameworks have been found to lack interpretability due to their end-to-end structure and “black-box” nature, thereby limiting their value as a planning support tool. In this context, we propose a five-step machine learning framework for extracting neighborhood-level urban perceptions from panoramic SVIs, specifically emphasizing feature and result interpretability. By utilizing the MIT Place Pulse data, the developed framework can systematically extract six dimensions of urban perceptions from the given panoramas, including perceptions of wealth, boredom, depression, beauty, safety, and liveliness. The practical utility of this framework is demonstrated through its deployment in Inner London, where it was used to visualize urban perceptions at the Output Area (OA) level and to verify against real-world crime rate

    COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

    Get PDF
    BACKGROUND: Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. METHODS: In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status. FINDINGS: Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1. INTERPRETATION: Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources. FUNDING: British Heart Foundation Data Science Centre, led by Health Data Research UK

    Investigating the association of environmental exposures and all-cause mortality in the UK Biobank using sparse principal component analysis

    No full text
    Multicollinearity refers to the presence of collinearity between multiple variables and renders the results of statistical inference erroneous (Type II error). This is particularly important in environmental health research where multicollinearity can hinder inference. To address this, correlated variables are often excluded from the analysis, limiting the discovery of new associations. An alternative approach to address this problem is the use of principal component analysis. This method, combines and projects a group of correlated variables onto a new orthogonal space. While this resolves the multicollinearity problem, it poses another challenge in relation to interpretability of results. Standard hypothesis testing methods can be used to evaluate the association of projected predictors, called principal components, with the outcomes of interest, however, there is no established way to trace the significance of principal components back to individual variables. To address this problem, we investigated the use of sparse principal component analysis which enforces a parsimonious projection. We hypothesise that this parsimony could facilitate the interpretability of findings. To this end, we investigated the association of 20 environmental predictors with all-cause mortality adjusting for demographic, socioeconomic, physiological, and behavioural factors. The study was conducted in a cohort of 379,690 individuals in the UK. During an average follow-up of 8.05 years (3,055,166 total person-years), 14,996 deaths were observed. We used Cox regression models to estimate the hazard ratio (HR) and 95% confidence intervals (CI). The Cox models were fitted to the standardised environmental predictors (a) without any transformation (b) transformed with PCA, and (c) transformed with SPCA. The comparison of findings underlined the potential of SPCA for conducting inference in scenarios where multicollinearity can increase the risk of Type II error. Our analysis unravelled a significant association between average noise pollution and increased risk of all-cause mortality. Specifically, those in the upper deciles of noise exposure have between 5 and 10% increased risk of all-cause mortality compared to the lowest decile

    Hi-BEHRT: Hierarchical transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

    No full text
    Electronic health records (EHR) represent a holistic overview of patients’ trajectories. Their increasing availability has fueled new hopes to leverage them and develop accurate risk prediction models for a wide range of diseases. Given the complex interrelationships of medical records and patient outcomes, deep learning models have shown clear merits in achieving this goal. However, a key limitation of current study remains their capacity in processing long sequences, and long sequence modelling and its application in the context of healthcare and EHR remains unexplored. Capturing the whole history of medical encounters is expected to lead to more accurate predictions, but the inclusion of records collected for decades and from multiple resources can inevitably exceed the receptive field of the most existing deep learning architectures. This can result in missing crucial, long-term dependencies. To address this gap, we present Hi-BEHRT, a hierarchical Transformer-based model that can significantly expand the receptive field of Transformers and extract associations from much longer sequences. Using a multimodal large-scale linked longitudinal EHR, the Hi-BEHRT exceeds the state-of-the-art deep learning models 1% to 5% for area under the receiver operating characteristic (AUROC) curve and 1% to 8% for area under the precision recall (AUPRC) curve on average, and 2% to 8% (AUROC) and 2% to 11% (AUPRC) for patients with long medical history for 5-year heart failure, diabetes, chronic kidney disease, and stroke risk prediction. Additionally, because pretraining for hierarchical Transformer is not well-established, we provide an effective end-to-end contrastive pre-training strategy for Hi-BEHRT using EHR, improving its transferability on predicting clinical events with relatively small training dataset

    Investigating the association of environmental exposures and all-cause mortality in the UK Biobank using sparse principal component analysis

    No full text
    tMulticollinearity refers to the presence of collinearity between multiple variables and renders the results of statistical inference erroneous (Type II error). This is particularly important in environmental health research where multicollinearity can hinder inference. To address this, correlated variables are often excluded from the analysis, limiting the discovery of new associations. An alternative approach to address this problem is the use of principal component analysis. This method, combines and projects a group of correlated variables onto a new orthogonal space. While this resolves the multicollinearity problem, it poses another challenge in relation to interpretability of results. Standard hypothesis testing methods can be used to evaluate the association of projected predictors, called principal components, with the outcomes of interest, however, there is no established way to trace the significance of principal components back to individual variables. To address this problem, we investigated the use of sparse principal component analysis which enforces a parsimonious projection. We hypothesise that this parsimony could facilitate the interpretability of findings. To this end, we investigated the association of 20 environmental predictors with all-cause mortality adjusting for demographic, socioeconomic, physiological, and behavioural factors. The study was conducted in a cohort of 379,690 individuals in the UK. During an average follow-up of 8.05 years (3,055,166 total person-years), 14,996 deaths were observed. We used Cox regression models to estimate the hazard ratio (HR) and 95% confidence intervals (CI). The Cox models were fitted to the standardised environmental predictors (a) without any transformation (b) transformed with PCA, and (c) transformed with SPCA. The comparison of findings underlined the potential of SPCA for conducting inference in scenarios where multicollinearity can increase the risk of Type II error. Our analysis unravelled a significant association between average noise pollution and increased risk of all-cause mortality. Specifically, those in the upper deciles of noise exposure have between 5 and 10% increased risk of all-cause mortality compared to the lowest decile.</p

    Road Traffic Noise and Incidence of Primary Hypertension: A Prospective Analysis in UK Biobank.

    No full text
    Background The quality of evidence regarding the associations between road traffic noise and hypertension is low due to the limitations of cross-sectional study design, and the role of air pollution remains to be further clarified. Objectives The purpose of this study was to evaluate the associations of long-term road traffic noise exposure with incident primary hypertension; we conducted a prospective population-based analysis in UK Biobank. Methods Road traffic noise was estimated at baseline residential address using the common noise assessment method model. Incident hypertension was ascertained through linkage with medical records. Cox proportional hazard models were used to estimate hazard ratios (HRs) for association in an analytical sample size of over 240,000 participants free of hypertension at baseline, adjusting for covariates determined via directed acyclic graph. Results During a median of 8.1 years follow-up, 21,140 incident primary hypertension (International Classification of Diseases-10th Revision [ICD-10]: I10) were ascertained. The HR for a 10 dB[A] increment in mean weighted average 24-hour road traffic noise level (Lden) exposure was 1.07 (95% CI: 1.02-1.13). A dose-response relationship was found, with HR of 1.13 (95% CI: 1.03-1.25) for Lden >65 dB[A] vs ≤55 dB[A] (P for trend Conclusions Long-term exposure to road traffic noise was associated with increased incidence of primary hypertension, and the effect estimates were stronger in presence of higher air pollution.</p

    Sodium-based paracetamol:impact on blood pressure, cardiovascular events, and all-cause mortality

    Get PDF
    Background and Aims Effervescent formulations of paracetamol containing sodium bicarbonate have been reported to associate with increased blood pressure and a higher risk of cardiovascular diseases and all-cause mortality. Given the major implications of these findings, the reported associations were re-examined. Methods Using linked electronic health records data, a cohort of 475 442 UK individuals with at least one prescription of paracetamol, aged between 60 and 90 years, was identified. Outcomes in patients taking sodium-based paracetamol were compared with those taking non–sodium-based formulations of the same. Using a deep learning approach, associations with systolic blood pressure (SBP), major cardiovascular events (myocardial infarction, heart failure, and stroke), and all-cause mortality within 1 year after baseline were investigated. Results A total of 460 980 and 14 462 patients were identified for the non–sodium-based and sodium-based paracetamol exposure groups, respectively (mean age: 74 years; 64% women). Analysis revealed no difference in SBP [mean difference −0.04 mmHg (95% confidence interval −0.51, 0.43)] and no association with major cardiovascular events [relative risk (RR) 1.03 (0.91, 1.16)]. Sodium-based paracetamol showed a positive association with all-cause mortality [RR 1.46 (1.40, 1.52)]. However, after further accounting of other sources of residual confounding, the observed association attenuated towards the null [RR 1.08 (1.01, 1.16)]. Exploratory analyses revealed dysphagia and related conditions as major sources of uncontrolled confounding by indication for this association. Conclusions This study does not support previous suggestions of increased SBP and an elevated risk of cardiovascular events from short-term use of sodium bicarbonate paracetamol in routine clinical practice

    Uncertainty-Aware Interpretable Deep Learning for Slum Mapping and Monitoring

    No full text
    Over a billion people live in slums, with poor sanitation, education, property rights and working conditions having a direct impact on current residents and future generations. Slum mapping is one of the key problems concerning slums. Policymakers need to delineate slum settlements to make informed decisions about infrastructure development and allocation of aid. A wide variety of machine learning and deep learning methods have been applied to multispectral satellite images to map slums with outstanding performance. Since the physical and visual manifestation of slums significantly varies with geographical region and comprehensive slum maps are rare, it is important to quantify the uncertainty of predictions for reliable and confident application of models to downstream tasks. In this study, we train a U-Net model with Monte Carlo Dropout (MCD) on 13-band Sentinel-2 images, allowing us to calculate pixelwise uncertainty in the predictions. The obtained outcomes show that the proposed model outperforms the previous state-of-the-art model, having both higher AUPRC and lower uncertainty when tested on unseen geographical regions of Mumbai using the regional testing framework introduced in this study. We also use SHapley Additive exPlanations (SHAP) values to investigate how the different features contribute to our model’s predictions which indicate a certain shortwave infrared image band is a powerful feature for determining the locations of slums within images. With our results, we demonstrate the usefulness of including an uncertainty quantification approach in detecting slum area changes over time
    corecore