29 research outputs found

    Improving prediction of risk of hospital admission in chronic obstructive pulmonary disease: application of machine learning to telemonitoring data

    Get PDF
    Background: Telemonitoring of symptoms and physiological signs has been suggested as a means of early detection of exacerbations of chronic obstructive pulmonary disease (COPD) with a view to instituting timely treatment. However, current algorithms to identify exacerbations result in frequent false positive results and increased workload. Machine learning, when applied to predictive modelling, can determine patterns of risk factors useful for improving quality of predictions. Objective: To establish if machine learning techniques applied to telemonitoring datasets improve prediction of hospital admissions, decisions to start steroids, and to determine if the addition of weather data further improves such predictions. Methods: We used daily symptoms, physiological measures and medication data, with baseline demography, COPD severity, quality of life, and hospital admissions from a pilot and large randomised controlled trial of telemonitoring in COPD. In addition, we linked weather data from the UK Meteorological Office. We used feature selection and extraction techniques for time-series to construct up to 153 predictive patterns (features) from symptom, medication, and physiological measurements. The resulting variables were used for the construction of predictive models fitted to training sets of patients and compared to common algorithms. Results: We had a mean 363 days of telemonitoring data from 135 patients. The two most practical traditional score-counting algorithms, restricted to cases with complete data resulted in AUC estimates of 0.60 [CI 95% 0.51, 0.69] and 0.58 [0.50, 0.67] for predicting admissions based on a single day’s readings. However, in a real-world scenario allowing for missing data, with greater numbers of patient daily data and hospitalisations (N = 57,150, N+=17), the performance of all the traditional algorithms fell, including those based on two days data. One of the most frequently used algorithms performed no better than chance. Machine learning models demonstrated significant improvements; the best machine learning algorithm based on 57,150 episodes resulted in an aggregated AUC = 0.73 [0.67, 0.79]. Addition of weather data measurements resulted in a negligible improvement in the predictive performance of the best model (AUC = 0.74 [0.69, 0.79]). In order to achieve an 80% true positive rate (sensitivity), the traditional algorithms were associated with an 80% false positive rate: our algorithm halved this rate to approximately 40% (specificity approximately 60%). The machine learning algorithm was moderately superior to the best standard algorithm (AUC = 0.77 [0.74, 0.79] v AUC = 0.66 [0.63, 0.68]) at predicting the need for steroids. Conclusions: The early detection and management of COPD remains an important goal given the huge personal and economic costs of the condition. Machine learning approaches, which can be tailored to an individual’s baseline profile and can learn from experience of the individual patient are superior to existing predictive algorithms show promise in achieving this goal

    Model Selection Approach Suggests Causal Association between 25-Hydroxyvitamin D and Colorectal Cancer

    Get PDF
    Vitamin D deficiency has been associated with increased risk of colorectal cancer (CRC), but causal relationship has not yet been confirmed. We investigate the direction of causation between vitamin D and CRC by extending the conventional approaches to allow pleiotropic relationships and by explicitly modelling unmeasured confounders.Plasma 25-hydroxyvitamin D (25-OHD), genetic variants associated with 25-OHD and CRC, and other relevant information was available for 2645 individuals (1057 CRC cases and 1588 controls) and included in the model. We investigate whether 25-OHD is likely to be causally associated with CRC, or vice versa, by selecting the best modelling hypothesis according to Bayesian predictive scores. We examine consistency for a range of prior assumptions.Model comparison showed preference for the causal association between low 25-OHD and CRC over the reverse causal hypothesis. This was confirmed for posterior mean deviances obtained for both models (11.5 natural log units in favour of the causal model), and also for deviance information criteria (DIC) computed for a range of prior distributions. Overall, models ignoring hidden confounding or pleiotropy had significantly poorer DIC scores.Results suggest causal association between 25-OHD and colorectal cancer, and support the need for randomised clinical trials for further confirmations

    Validity of a2-component imaging-derived disease activity score (2C-DAS28) for improved assessment of synovitis in early rheumatoid arthritis

    Get PDF
    Objectives. Imaging of joint inflammation provides a standard against which to derive an updated DAS for RA. Our objectives were to develop and validate a DAS based on reweighting the DAS28 components to maximize association with US-assessed synovitis. Methods. Early RA patients from two observational cohorts (n = 434 and n = 117) and a clinical trial (n = 59) were assessed at intervals up to 104 weeks from baseline; all US scans were within 1 week of clinical exam. There were 899, 163 and 183 visits in each cohort. Associations of combined US grey scale and power Doppler scores (GSPD) with 28 tender joint count and 28 swollen joint count (SJC28), CRP, ESR and general health visual analogue scale were examined in linear mixed model regressions. Cross-validation evaluated model predictive ability. Coefficients learned from training data defined a re-weighted DAS28 that was validated against radiographic progression in independent data (3037 observations; 717 patients). Results. Of the conventional DAS28 components only SJC28 and CRP were associated with GSPD in all three development cohorts. A two-component model including SJC28 and CRP outperformed a four-component model (R2 = 0.235, 0.392, 0.380 vs 0.232, 0.380, 0.375, respectively). The re-weighted two-component DAS28CRP outperformed conventional DAS28 definitions in predicting GSPD (test log-likelihood <2.6, P < 0.01), Larsen score and presence of erosions. Conclusion. A score based on SJC28 and CRP alone demonstrated stronger associations with synovitis and radiographic progression than the original DAS28 and should be considered in research on pathophysiological manifestations of early RA. Implications for clinical management of RA remain to be established

    Serum kidney injury molecule 1 and β2-microglobulin perform as well as larger biomarker panels for prediction of rapid decline in renal function in type 2 diabetes

    Get PDF
    Aims/hypothesis: As part of the Surrogate Markers for Micro- and Macrovascular Hard Endpoints for Innovative Diabetes Tools (SUMMIT) programme we previously reported that large panels of biomarkers derived from three analytical platforms maximised prediction of progression of renal decline in type 2 diabetes. Here, we hypothesised that smaller (n ≤ 5), platform-specific combinations of biomarkers selected from these larger panels might achieve similar prediction performance when tested in three additional type 2 diabetes cohorts. Methods: We used 657 serum samples, held under differing storage conditions, from the Scania Diabetes Registry (SDR) and Genetics of Diabetes Audit and Research Tayside (GoDARTS), and a further 183 nested case–control sample set from the Collaborative Atorvastatin in Diabetes Study (CARDS). We analysed 42 biomarkers measured on the SDR and GoDARTS samples by a variety of methods including standard ELISA, multiplexed ELISA (Luminex) and mass spectrometry. The subset of 21 Luminex biomarkers was also measured on the CARDS samples. We used the event definition of loss of >20% of baseline eGFR during follow-up from a baseline eGFR of 30–75 ml min−1 [1.73 m]−2. A total of 403 individuals experienced an event during a median follow-up of 7 years. We used discrete-time logistic regression models with tenfold cross-validation to assess association of biomarker panels with loss of kidney function. Results: Twelve biomarkers showed significant association with eGFR decline adjusted for covariates in one or more of the sample sets when evaluated singly. Kidney injury molecule 1 (KIM-1) and β2-microglobulin (B2M) showed the most consistent effects, with standardised odds ratios for progression of at least 1.4 (p < 0.0003) in all cohorts. A combination of B2M and KIM-1 added to clinical covariates, including baseline eGFR and albuminuria, modestly improved prediction, increasing the area under the curve in the SDR, Go-DARTS and CARDS by 0.079, 0.073 and 0.239, respectively. Neither the inclusion of additional Luminex biomarkers on top of B2M and KIM-1 nor a sparse mass spectrometry panel, nor the larger multiplatform panels previously identified, consistently improved prediction further across all validation sets. Conclusions/interpretation: Serum KIM-1 and B2M independently improve prediction of renal decline from an eGFR of 30–75 ml min−1 [1.73 m]−2 in type 2 diabetes beyond clinical factors and prior eGFR and are robust to varying sample storage conditions. Larger panels of biomarkers did not improve prediction beyond these two biomarkers

    Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models

    Get PDF
    We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimizing genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge

    Glycosylation of plasma IgG in colorectal cancer prognosis

    Get PDF
    In this study we demonstrate the potential value of Immunoglobulin G (IgG) glycosylation as a novel prognostic biomarker of colorectal cancer (CRC). We analysed plasma IgG glycans in 1229 CRC patients and correlated with survival outcomes. We assessed the predictive value of clinical algorithms and compared this to algorithms that also included glycan predictors. Decreased galactosylation, decreased sialylation (of fucosylated IgG glycan structures) and increased bisecting GlcNAc in IgG glycan structures were strongly associated with all-cause (q < 0.01) and CRC mortality (q = 0.04 for galactosylation and sialylation). Clinical algorithms showed good prediction of all-cause and CRC mortality (Harrell’s C: 0.73, 0.77; AUC: 0.75, 0.79, IDI: 0.02, 0.04 respectively). The inclusion of IgG glycan data did not lead to any statistically significant improvements overall, but it improved the prediction over clinical models for stage 4 patients with the shortest follow-up time until death, with the median gain in the test AUC of 0.08. These glycan differences are consistent with significantly increased IgG pro-inflammatory activity being associated with poorer CRC prognosis, especially in late stage CRC. In the absence of validated biomarkers to improve upon prognostic information from existing clinicopathological factors, the potential of these novel IgG glycan biomarkers merits further investigation

    Genome-wide Association Study of Response to Methotrexate in Early Rheumatoid Arthritis Patients

    Get PDF
    Methotrexate (MTX) monotherapy is a common first treatment for rheumatoid arthritis (RA), but many patients do not respond adequately. In order to identify genetic predictors of response, we have combined data from two consortia to carry out a genome-wide study of response to MTX in 1424 early RA patients of European ancestry. Clinical endpoints were change from baseline to 6 months after starting treatment in swollen 28-joint count, tender 28-joint count, C-reactive protein and the overall 3-component disease activity score (DAS28). No single nucleotide polymorphism (SNP) reached genome-wide statistical significance for any outcome measure. The strongest evidence for association was with rs168201 in NRG3 (p = 10‾⁷ for change in DAS28). Some support was also seen for association with ZMIZ1, previously highlighted in a study of response to MTX in juvenile idiopathic arthritis. Follow-up in two smaller cohorts of 429 and 177 RA patients did not support these findings, although these cohorts were more heterogeneous
    corecore