3,373 research outputs found

    Controlling the Precision-Recall Tradeoff in Differential Dependency Network Analysis

    Full text link
    Graphical models have gained a lot of attention recently as a tool for learning and representing dependencies among variables in multivariate data. Often, domain scientists are looking specifically for differences among the dependency networks of different conditions or populations (e.g. differences between regulatory networks of different species, or differences between dependency networks of diseased versus healthy populations). The standard method for finding these differences is to learn the dependency networks for each condition independently and compare them. We show that this approach is prone to high false discovery rates (low precision) that can render the analysis useless. We then show that by imposing a bias towards learning similar dependency networks for each condition the false discovery rates can be reduced to acceptable levels, at the cost of finding a reduced number of differences. Algorithms developed in the transfer learning literature can be used to vary the strength of the imposed similarity bias and provide a natural mechanism to smoothly adjust this differential precision-recall tradeoff to cater to the requirements of the analysis conducted. We present real case studies (oncological and neurological) where domain experts use the proposed technique to extract useful differential networks that shed light on the biological processes involved in cancer and brain function

    Mortality modelling and forecasting: a review of methods

    Get PDF

    Machine learning approaches to optimise the management of patients with sepsis

    Get PDF
    The goal of this PhD was to generate novel tools to improve the management of patients with sepsis, by applying machine learning techniques on routinely collected electronic health records. Machine learning is an application of artificial intelligence (AI), where a machine analyses data and becomes able to execute complex tasks without being explicitly programmed. Sepsis is the third leading cause of death worldwide and the main cause of mortality in hospitals, but the best treatment strategy remains uncertain. In particular, evidence suggests that current practices in the administration of intravenous fluids and vasopressors are suboptimal and likely induce harm in a proportion of patients. This represents a key clinical challenge and a top research priority. The main contribution of the research has been the development of a reinforcement learning framework and algorithms, in order to tackle this sequential decision-making problem. The model was built and then validated on three large non-overlapping intensive care databases, containing data collected from adult patients in the U.S.A and the U.K. Our agent extracted implicit knowledge from an amount of patient data that exceeds many-fold the life-time experience of human clinicians and learned optimal treatment by having analysed myriads of (mostly sub-optimal) treatment decisions. We used state-of-the-art evaluation techniques (called high confidence off-policy evaluation) and demonstrated that the value of the treatment strategy of the AI agent was on average reliably higher than the human clinicians. In two large validation cohorts independent from the training data, mortality was the lowest in patients where clinicians’ actual doses matched the AI policy. We also gained insight into the model representations and confirmed that the AI agent relied on clinically and biologically meaningful parameters when making its suggestions. We conducted extensive testing and exploration of the behaviour of the AI agent down to the level of individual patient trajectories, identified potential sources of inappropriate behaviour and offered suggestions for future model refinements. If validated, our model could provide individualized and clinically interpretable treatment decisions for sepsis that may improve patient outcomes.Open Acces

    Pathophysiological characterization of traumatic brain injury using novel analytical methods

    Get PDF
    Severity of traumatic brain injury is usually classified by Glasgow coma scale (GCS) as “mild”, "moderate" or "severe’, which does not capture the heterogeneity of the disease. According to current guidelines, intracranial pressure (ICP) should not exceed 22 mmHg, with no further recommendations concerning individualization or tolerable duration of intracranial hypertension. The aims of this thesis were to identify subgroups of patients beyond characterization using GCS, and to investigate the impact of duration and magnitude of intracranial hypertension on outcome, using data from the observational prospective study Collaborative European neurotrauma effectiveness research in TBI (CENTER-TBI). To investigate the temporal aspect of tolerable ICP elevations, we examined the correlation between dose of ICP and outcome represented by 6-month Glasgow outcome scale extended (GOSE). ICP dose was represented both by the number of events above thresholds for ICP magnitude and duration and by area under the ICP curve (i.e., “pressure time dose” (PTD)). A variation in tolerable ICP thresholds of 18 mmHg +/- 4 mmHg (2 standard deviations (SD)) for events with duration longer than five minutes was identified using a bootstrapping technique. PTD was correlated to both mortality and unfavorable outcome. A cerebrovascular autoregulation (CA) dependent ICP tolerability was identified. If CA was impaired, no tolerable ICP magnitude and duration thresholds were identified, while if CA was intact, both 19 mmHg for 5 minutes or longer and 15 mmHg for 50 minutes or longer were correlated to worse outcome. While no significant difference in PTD was seen between favorable and unfavorable outcome if CA was intact, there was a significant difference if CA was impaired. In a multivariable analysis, PTD did not remain a significant predictor of outcome when adjusting for other known predictors in TBI. In a causal inference analysis, both cerebrovascular autoregulation status and ICP-lowering therapies represented by the therapy intensity level (TIL) have a directional relationship with outcome. However, no direct causal relationship of ICP towards outcome was found. By applying an unsupervised clustering method, we identified six distinct admission clusters defined by GCS, lactate, oxygen saturation (SpO2), creatinine, glucose, base excess, pH, PaCO2, and body temperature. These clusters can be summarized in clinical presentation and metabolic profile. When clustering longitudinal features during the first week in the intensive care unit (ICU), no optimal number of clusters could be seen. However, glucose variation, a panel of brain biomarkers, and creatinine consistently described trajectories. Although no information on outcome was included in the models, both admission clusters and trajectories showed clear outcome differences, with mortality from 7 to 40% in the admission clusters and 4 to 85% in the trajectories. Adding cluster or trajectory labels to the established outcome prediction IMPACT model significantly improved outcome predictions. The results in this thesis support the importance of cerebrovascular autoregulation status as it was found that CA status was more informative towards outcome than ICP magnitude and duration. There was a variation in tolerable ICP intensity and duration dependent on whether CA was intact. Distinct clusters defined by GCS and metabolic profiles related to outcome suggest the importance of an extracranial evaluation in addition to GCS in TBI patients. Longitudinal trajectories of TBI patients in the ICU are highly characterized by glucose variation, brain biomarkers and creatinine

    Data-driven modelling of biological multi-scale processes

    Full text link
    Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.Comment: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers

    A blood gene expression marker of early Alzheimer's disease.

    Get PDF
    PublishedJournal ArticleResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tA marker of Alzheimer's disease (AD) that can accurately diagnose disease at the earliest stage would significantly support efforts to develop treatments for early intervention. We have sought to determine the sensitivity and specificity of peripheral blood gene expression as a diagnostic marker of AD using data generated on HT-12v3 BeadChips. We first developed an AD diagnostic classifier in a training cohort of 78 AD and 78 control blood samples and then tested its performance in a validation group of 26 AD and 26 control and 118 mild cognitive impairment (MCI) subjects who were likely to have an AD-endpoint. A 48 gene classifier achieved an accuracy of 75% in the AD and control validation group. Comparisons were made with a classifier developed using structural MRI measures, where both measures were available in the same individuals. In AD and control subjects, the gene expression classifier achieved an accuracy of 70% compared to 85% using MRI. Bootstrapping validation produced expression and MRI classifiers with mean accuracies of 76% and 82%, respectively, demonstrating better concordance between these two classifiers than achieved in a single validation population. We conclude there is potential for blood expression to be a marker for AD. The classifier also predicts a large number of people with MCI, who are likely to develop AD, are more AD-like than normal with 76% of subjects classified as AD rather than control. Many of these people do not have overt brain atrophy, which is known to emerge around the time of AD diagnosis, suggesting the expression classifier may detect AD earlier in the prodromal phase. However, we accept these results could also represent a marker of diseases sharing common etiology.InnoMed, European Union of the Sixth Framework programAlzheimer’s Research UKJohn and Lucille van Geest FoundationNIHRBiomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation TrustInstitute of Psychiatry Kings College LondonNIA/NIH RC

    DM-PhyClus: A Bayesian phylogenetic algorithm for infectious disease transmission cluster inference

    Full text link
    Background. Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus, that identifies sets of sequences resulting from quick transmission chains, thus yielding easily-interpretable clusters, without using any ad hoc distance or confidence requirement. Results. Simulations reveal that DM-PhyClus can outperform conventional clustering methods, as well as the Gap procedure, a pure distance-based algorithm, in terms of mean cluster recovery. We apply DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters whose inference is in line with the conclusions of a previous thorough analysis. Conclusions. DM-PhyClus, by eliminating the need for cutpoints and producing sensible inference for cluster configurations, can facilitate transmission cluster detection. Future efforts to reduce incidence of infectious diseases, like HIV-1, will need reliable estimates of transmission clusters. It follows that algorithms like DM-PhyClus could serve to better inform public health strategies

    A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the UK Biobank using different primary care EHR and clinical terminology systems

    Get PDF
    Objectives: The UK Biobank (UKB) is making primary care electronic health records (EHRs) for 500 000 participants available for COVID-19-related research. Data are extracted from four sources, recorded using five clinical terminologies and stored in different schemas. The aims of our research were to: (a) develop a semi-supervised approach for bootstrapping EHR phenotyping algorithms in UKB EHR, and (b) to evaluate our approach by implementing and evaluating phenotypes for 31 common biomarkers. Materials and Methods: We describe an algorithmic approach to phenotyping biomarkers in primary care EHR involving (a) bootstrapping definitions using existing phenotypes, (b) excluding generic, rare, or semantically distant terms, (c) forward-mapping terminology terms, (d) expert review, and (e) data extraction. We evaluated the phenotypes by assessing the ability to reproduce known epidemiological associations with all-cause mortality using Cox proportional hazards models. Results: We created and evaluated phenotyping algorithms for 31 biomarkers many of which are directly related to COVID-19 complications, for example diabetes, cardiovascular disease, respiratory disease. Our algorithm identified 1651 Read v2 and Clinical Terms Version 3 terms and automatically excluded 1228 terms. Clinical review excluded 103 terms and included 44 terms, resulting in 364 terms for data extraction (sensitivity 0.89, specificity 0.92). We extracted 38 190 682 events and identified 220 978 participants with at least one biomarker measured. Discussion and conclusion: Bootstrapping phenotyping algorithms from similar EHR can potentially address pre-existing methodological concerns that undermine the outputs of biomarker discovery pipelines and provide research-quality phenotyping algorithms
    • …
    corecore