78 research outputs found

    Predicting from aggregated data

    Get PDF
    Aggregated data, which refers to a collection of data summarized from multiple sources, is a technique commonly used in different fields of research including healthcare, web application, and sensor network. Aggregated data is often employed to handle issues such as privacy, scalability, and reliability. However, accurately predicting individual outcomes from grouped datasets can be very difficult. In this thesis, we designed a new learning method, a Mixture of Expert (MoE) model, focused on individual-level prediction when training variables are aggregated. We utilized the MoE model, trained and validated using the eICU Collaborative Research patient datasets, to conduct a series of studies. Our results showed that applying grouping functions to the classification of aggregated data across demographic and behavior metrics could remain effective. This technique was verified by comparing two separately trained MoE models that were evaluated on the same datasets. Finally, we estimated non-aggregated datasets from spatio-temporal aggregated records by expressing the problem into the frequency domain, and trained an autoregressive model for predicting future stock prices. This process can be repeated, offering a potential solution to the issue of learning from aggregated data.Ope

    Implementation and Application of Genomic Association Methods to Clostridium Difficile Toxicity and Clinical Infection Outcomes

    Full text link
    Clostridium difficile is a major cause of healthcare-associated infections in the United States. A C. difficile infection can lead to a range of outcomes including diarrhea, intensive care unit admission, abdominal surgery, or death. Pathogenesis is mediated by the release of toxin from C. difficile cells growing in the intestines. Some patients are more vulnerable to infection, including those with previous antibiotic exposure and advanced age. Host factors can affect the likelihood of infection but also the severity of infection. Additionally, infection severity can be influenced by the genome of the infecting strain(s). Host-pathogen interactions are extremely complex and very little is known about the interplay between host factors and C. difficile genomic variation with respect to infection likelihood and outcomes. With the recent deluge of whole genome sequencing data, the contribution of bacterial genomic variation to infections can be more comprehensively evaluated than ever before. The work described in this dissertation used two different approaches to test for associations between C. difficile genomic variation and clinically relevant phenotypes. In the first approach we implemented and applied a novel convergence-based bacterial genome-wide association study (bGWAS) algorithm for quantitative traits. We introduce the algorithm using a set of data generated in silico to realistically model bacterial genome variation and phenotypes under various evolutionary regimes. When the algorithm was applied to C. difficile genomic variants and toxin activity our bGWAS identified known toxin regulatory genes associated with toxin activity, supporting the value of our approach. Besides identifying key cis-regulatory variants in the toxin-producing locus, we observed several associations that connect toxin activity to a complex network of trans-regulatory genes. Many highly associated variants occur in flagellar genes and indicate coregulation of toxicity and motility. We propose new variants associated with toxin activity for future functional validation. This study focused on a complex phenotype, toxin activity, within a highly controlled in vitro system. We next investigated the impact of bacterial genetic variation on human infections. The increased complexity of this human-pathogen interaction justified a different association approach to better understand the independent contribution of bacterial genomic variation to infection. In a set of clinically derived isolates, we tested for the association between variants in trehalose metabolism operons and infection severity while incorporating and controlling for infection severity-modulating patient characteristics. Trehalose utilization variants were recently proposed to modulate C. difficile infections in a mouse model. Interestingly, we observed that this in vivo result did not translate to our clinical cohort as we found no evidence of an association between any of the trehalose utilization variants and patient infection outcomes. Taken together, these results demonstrate the utility of applying multiple approaches for identifying genomic variants associated with clinical outcomes that account for either bacterial population structure or host factors.PHDMicrobiology & ImmunologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/166125/1/katiephd_1.pd

    Timely and reliable evaluation of the effects of interventions: a framework for adaptive meta-analysis (FAME)

    Get PDF
    Most systematic reviews are retrospective and use aggregate data AD) from publications, meaning they can be unreliable, lag behind therapeutic developments and fail to influence ongoing or new trials. Commonly, the potential influence of unpublished or ongoing trials is overlooked when interpreting results, or determining the value of updating the meta-analysis or need to collect individual participant data (IPD). Therefore, we developed a Framework for Adaptive Metaanalysis (FAME) to determine prospectively the earliest opportunity for reliable AD meta-analysis. We illustrate FAME using two systematic reviews in men with metastatic (M1) and non-metastatic (M0)hormone-sensitive prostate cancer (HSPC)

    Preface

    Get PDF

    Development and use of methods to estimate chronic disease prevalence in small populations

    No full text
    Introduction National data on the prevalence of chronic diseases on general practice registers is now available. The aim of this PhD was to develop and validate epidemiological models for the expected prevalence of chronic obstructive pulmonary disease (COPD), coronary heart disease (CHD), stroke, hypertension, overall cardiovascular disease (CVD) and high CVD risk at general practice and small area level, and to explore the extent of undiagnosed disease, factors associated with it, and its impact on population health. Methods Multinomial logistic regression models were fitted to pooled Health Survey for England data to derive odds ratios for disease risk factors. These were applied to general practice and small area level population data, split by age, sex, ethnicity, deprivation, rurality and smoking status, to estimate expected disease prevalence at these levels. Validation was carried out using external data, including population-based epidemiological research and case-finding initiatives. Practice-level undiagnosed disease prevalence i.e. expected minus registered disease prevalence, and hospital admission rates for these conditions, were evaluated as outcome indicators of the quality and supply of primary health care services, using ordinary least squares (OLS) regression, geographically-weighted regression (GWR), and other spatial analytic methods. Results Risk factors, odds of disease and expected prevalence were consistent with external data sources. Spatial analysis showed strong evidence of spatial non-stationarity of undiagnosed disease prevalence, with high levels of undiagnosed disease in London and other conurbations, and associations with low supply of primary health care services. Higher hospital admission rates were associated with population deprivation, poorer quality and supply of primary health care services and poorer access to them, and for COPD, with higher levels of undiagnosed disease. Conclusion The epidemiologic prevalence models have been implemented in national data sources such as NHS Comparators, the Association of Public Health Observatories website, and a number of national reports. Early experience suggests that they are useful for guiding case-finding at practice level and improving and regulating the quality of primary health care. Comparisons with external data, in particular prevalence of disease detected by general practices, suggest that model predictions are valid. Practice-level spatial analyses of undiagnosed disease prevalence and hospital admission rates failed to demonstrate superiority of GWR over OLS methods. Disease modellers should be encouraged to collaborate more effectively, and to validate and compare modelling methods using an agreed framework. National leadership is needed to further develop and implement disease models. It is likely that prevalence models will prove to be most useful for identifying undiagnosed diseases with a slow and insidious onset, such as COPD, diabetes and hypertension

    Causes and consequences of adult sepsis in Blantyre, Malawi

    Get PDF
    Sepsis, defined as a life-threatening organ dysfunction triggered by infection, carries a high mortality. Recent improvements in outcome high-income settings have been driven by prompt antimicrobial therapy and fluid resuscitation but mortality remains disproportionately high in low-resource settings like the nations of sub-Saharan Africa (sSA). Sepsis therapy here often consists of empiric, prolonged courses of broad-spectrum antimicrobials, especially third generation cephalosporins like ceftriaxone, which may be driving the rise of ceftriaxone-resistant extended-spectrum ďż˝-lactamase producing Enterobacteriaceae (ESBLE). However the aetiology of sepsis in sSA is far from clear, and in this thesis I hypothesise that it may be possible to improve outcomes in sepsis whilst reducing selection pressure for ESBL-E, with novel, targeted, antimicrobial strategies tailored to the pathogens that are truly causing sepsis here. To that end, I present findings from a clinical cohort study of sepsis in Blantyre, Malawi, with two aims: first, a description of the presentation and outcomes of sepsis in Blantyre, with a focus on aetiology and an analysis of the determinants of mortality; and secondly, a description of the gut mucosal carriage of ESBL-E in sepsis survivors (as well as antibiotic unexposed inpatient and community controls) as they pass through the hospital to identify determinants of carriage. An expanded package of diagnostic tests was used to define sepsis aetiology, and serial stool sampling with selective culture for ESBL-E used to define ESBL-E carriage. I use whole-genome sequencing of cultured ESBL E. coli to track bacteria and mobile genetic elements within participants over time, and continuous time Markov models to provide insight into the drivers of carriage. I find that the majority of participants with sepsis are young, and HIV-infected. Disseminated tuberculosis (TB) dominates as a cause of sepsis, and there is an association of receipt of antituberculous chemotherapy with survival that suggests an expanded role for TB therapy in these very unwell patients may be beneficial. Sepsis mortality seems to have improved compared to historic cohorts, but post 28-day mortality in HIV-infected individuals is significant. At baseline gut mucosal ESBL-E carriage is common, with cultured ESBL-E present in the stool of 49% of participants with sepsis on the day of admission. There is further rapid increase in colonisation prevalence following admission and antibacterial exposure. Associations of baseline colonisation - household crowding and unprotected water sources - suggest both within-household and environmental routes of transmission are important. Genomic analysis suggest unrestricted mixing of ESBL E. coli at multiple spatial levels and rapid turnover within the individual, perhaps suggestive of frequent re-exposure. By using the genetic environment of ESBL genes as a proxy for mobile genetic elements (which are difficult to assemble from short read sequencing) I show that, within individuals, the E.coli strain-mobile genetic element combination is conserved over time whereas the strain or mobile genetic element alone is not; this suggests that the unit of transmission of ESBL gene to study participants is the bacterium, rather than mobile genetic element. Longitudinal modelling provides further insight into ESBL-E carriage dynamics: hospitalisation and antibacterial exposure act synergistically to bring about rapid and prolonged carriage driven, in part, by a significant post-antibiotic effect. This effect means that antibacterials act to prolong carriage long after antibacterial exposure stops. In terms of ESBL-E carriage, short courses of antibacterials have a similar effect to longer courses, such that the data generated in this study do not support my hypothesis and it may not be possible to reduce ESBL-E carriage by truncating courses of ceftriaxone. Nevertheless, the post-antibiotic effect deserves further scrutiny to understand the mechanism and as a potential therapeutic target. In addition, the modelling approach suggests cotrimoxazole preventative therapy (CPT) may be a significant driver of long-term ESBL-E carriage, and I suggest that a more nuanced approach to its deployment may be necessary in an era of increasing Gram-negative resistance
    • …
    corecore