3,373 research outputs found
Controlling the Precision-Recall Tradeoff in Differential Dependency Network Analysis
Graphical models have gained a lot of attention recently as a tool for
learning and representing dependencies among variables in multivariate data.
Often, domain scientists are looking specifically for differences among the
dependency networks of different conditions or populations (e.g. differences
between regulatory networks of different species, or differences between
dependency networks of diseased versus healthy populations). The standard
method for finding these differences is to learn the dependency networks for
each condition independently and compare them. We show that this approach is
prone to high false discovery rates (low precision) that can render the
analysis useless. We then show that by imposing a bias towards learning similar
dependency networks for each condition the false discovery rates can be reduced
to acceptable levels, at the cost of finding a reduced number of differences.
Algorithms developed in the transfer learning literature can be used to vary
the strength of the imposed similarity bias and provide a natural mechanism to
smoothly adjust this differential precision-recall tradeoff to cater to the
requirements of the analysis conducted. We present real case studies
(oncological and neurological) where domain experts use the proposed technique
to extract useful differential networks that shed light on the biological
processes involved in cancer and brain function
Recommended from our members
Computational framework for longevity risk management
Longevity risk threatens the financial stability of private and government sponsored defined benefit pension systems as well as social security schemes, in an environment already characterized by persistent low interest rates and heightened financial uncertainty. The mortality experience of countries in the industrialized world would suggest a substantial age-time interaction, with the two dominant trends affecting different age groups at different times. From a statistical point of view, this indicates a dependence structure. It is observed that mortality improvements are similar for individuals of contiguous ages (Wills and Sherris, Integrating financial and demographic longevity risk models: an Australian model for financial applications, Discussion Paper PI-0817, 2008). Moreover, considering the dataset by single ages, the correlations between the residuals for adjacent age groups tend to be high (as noted in Denton et al., J Population Econ 18:203-227, 2005). This suggests that there is value in exploring the dependence structure, also across time, in other words the inter-period correlation. In this research, we focus on the projections of mortality rates, contravening the most commonly encountered dependence property which is the "lack of dependence" (Denuit et al., Actuarial theory for dependent risks: measures. Orders and models, Wiley, New York, 2005). By taking into account the presence of dependence across age and time which leads to systematic over-estimation or under-estimation of uncertainty in the estimates (Liu and Braun, J Probability Stat, 813583:15, 2010), the paper analyzes a tailor-made bootstrap methodology for capturing the spatial dependence in deriving confidence intervals for mortality projection rates. We propose a method which leads to a prudent measure of longevity risk, avoiding the structural incompleteness of the ordinary simulation bootstrap methodology which involves the assumption of independence
Machine learning approaches to optimise the management of patients with sepsis
The goal of this PhD was to generate novel tools to improve the management of patients with sepsis, by applying machine learning techniques on routinely collected electronic health records. Machine learning is an application of artificial intelligence (AI), where a machine analyses data and becomes able to execute complex tasks without being explicitly programmed. Sepsis is the third leading cause of death worldwide and the main cause of mortality in hospitals, but the best treatment strategy remains uncertain. In particular, evidence suggests that current practices in the administration of intravenous fluids and vasopressors are suboptimal and likely induce harm in a proportion of patients. This represents a key clinical challenge and a top research priority.
The main contribution of the research has been the development of a reinforcement learning framework and algorithms, in order to tackle this sequential decision-making problem. The model was built and then validated on three large non-overlapping intensive care databases, containing data collected from adult patients in the U.S.A and the U.K. Our agent extracted implicit knowledge from an amount of patient data that exceeds many-fold the life-time experience of human clinicians and learned optimal treatment by having analysed myriads of (mostly sub-optimal) treatment decisions. We used state-of-the-art evaluation techniques (called high confidence off-policy evaluation) and demonstrated that the value of the treatment strategy of the AI agent was on average reliably higher than the human clinicians. In two large validation cohorts independent from the training data, mortality was the lowest in patients where clinicians’ actual doses matched the AI policy. We also gained insight into the model representations and confirmed that the AI agent relied on clinically and biologically meaningful parameters when making its suggestions. We conducted extensive testing and exploration of the behaviour of the AI agent down to the level of individual patient trajectories, identified potential sources of inappropriate behaviour and offered suggestions for future model refinements.
If validated, our model could provide individualized and clinically interpretable treatment decisions for sepsis that may improve patient outcomes.Open Acces
Pathophysiological characterization of traumatic brain injury using novel analytical methods
Severity of traumatic brain injury is usually classified by Glasgow coma scale (GCS) as “mild”,
"moderate" or "severe’, which does not capture the heterogeneity of the disease. According to
current guidelines, intracranial pressure (ICP) should not exceed 22 mmHg, with no further
recommendations concerning individualization or tolerable duration of intracranial
hypertension. The aims of this thesis were to identify subgroups of patients beyond
characterization using GCS, and to investigate the impact of duration and magnitude of
intracranial hypertension on outcome, using data from the observational prospective study
Collaborative European neurotrauma effectiveness research in TBI (CENTER-TBI).
To investigate the temporal aspect of tolerable ICP elevations, we examined the correlation
between dose of ICP and outcome represented by 6-month Glasgow outcome scale extended
(GOSE). ICP dose was represented both by the number of events above thresholds for ICP
magnitude and duration and by area under the ICP curve (i.e., “pressure time dose” (PTD)). A
variation in tolerable ICP thresholds of 18 mmHg +/- 4 mmHg (2 standard deviations (SD)) for
events with duration longer than five minutes was identified using a bootstrapping technique.
PTD was correlated to both mortality and unfavorable outcome.
A cerebrovascular autoregulation (CA) dependent ICP tolerability was identified. If CA was
impaired, no tolerable ICP magnitude and duration thresholds were identified, while if CA was
intact, both 19 mmHg for 5 minutes or longer and 15 mmHg for 50 minutes or longer were
correlated to worse outcome. While no significant difference in PTD was seen between
favorable and unfavorable outcome if CA was intact, there was a significant difference if CA
was impaired. In a multivariable analysis, PTD did not remain a significant predictor of
outcome when adjusting for other known predictors in TBI. In a causal inference analysis, both
cerebrovascular autoregulation status and ICP-lowering therapies represented by the therapy
intensity level (TIL) have a directional relationship with outcome. However, no direct causal
relationship of ICP towards outcome was found.
By applying an unsupervised clustering method, we identified six distinct admission clusters
defined by GCS, lactate, oxygen saturation (SpO2), creatinine, glucose, base excess, pH,
PaCO2, and body temperature. These clusters can be summarized in clinical presentation and
metabolic profile. When clustering longitudinal features during the first week in the intensive
care unit (ICU), no optimal number of clusters could be seen. However, glucose variation, a
panel of brain biomarkers, and creatinine consistently described trajectories. Although no
information on outcome was included in the models, both admission clusters and trajectories
showed clear outcome differences, with mortality from 7 to 40% in the admission clusters and
4 to 85% in the trajectories. Adding cluster or trajectory labels to the established outcome
prediction IMPACT model significantly improved outcome predictions.
The results in this thesis support the importance of cerebrovascular autoregulation status as it
was found that CA status was more informative towards outcome than ICP magnitude and
duration. There was a variation in tolerable ICP intensity and duration dependent on whether
CA was intact. Distinct clusters defined by GCS and metabolic profiles related to outcome
suggest the importance of an extracranial evaluation in addition to GCS in TBI patients.
Longitudinal trajectories of TBI patients in the ICU are highly characterized by glucose
variation, brain biomarkers and creatinine
Data-driven modelling of biological multi-scale processes
Biological processes involve a variety of spatial and temporal scales. A
holistic understanding of many biological processes therefore requires
multi-scale models which capture the relevant properties on all these scales.
In this manuscript we review mathematical modelling approaches used to describe
the individual spatial scales and how they are integrated into holistic models.
We discuss the relation between spatial and temporal scales and the implication
of that on multi-scale modelling. Based upon this overview over
state-of-the-art modelling approaches, we formulate key challenges in
mathematical and computational modelling of biological multi-scale and
multi-physics processes. In particular, we considered the availability of
analysis tools for multi-scale models and model-based multi-scale data
integration. We provide a compact review of methods for model-based data
integration and model-based hypothesis testing. Furthermore, novel approaches
and recent trends are discussed, including computation time reduction using
reduced order and surrogate models, which contribute to the solution of
inference problems. We conclude the manuscript by providing a few ideas for the
development of tailored multi-scale inference methods.Comment: This manuscript will appear in the Journal of Coupled Systems and
Multiscale Dynamics (American Scientific Publishers
A blood gene expression marker of early Alzheimer's disease.
PublishedJournal ArticleResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tA marker of Alzheimer's disease (AD) that can accurately diagnose disease at the earliest stage would significantly support efforts to develop treatments for early intervention. We have sought to determine the sensitivity and specificity of peripheral blood gene expression as a diagnostic marker of AD using data generated on HT-12v3 BeadChips. We first developed an AD diagnostic classifier in a training cohort of 78 AD and 78 control blood samples and then tested its performance in a validation group of 26 AD and 26 control and 118 mild cognitive impairment (MCI) subjects who were likely to have an AD-endpoint. A 48 gene classifier achieved an accuracy of 75% in the AD and control validation group. Comparisons were made with a classifier developed using structural MRI measures, where both measures were available in the same individuals. In AD and control subjects, the gene expression classifier achieved an accuracy of 70% compared to 85% using MRI. Bootstrapping validation produced expression and MRI classifiers with mean accuracies of 76% and 82%, respectively, demonstrating better concordance between these two classifiers than achieved in a single validation population. We conclude there is potential for blood expression to be a marker for AD. The classifier also predicts a large number of people with MCI, who are likely to develop AD, are more AD-like than normal with 76% of subjects classified as AD rather than control. Many of these people do not have overt brain atrophy, which is known to emerge around the time of AD diagnosis, suggesting the expression classifier may detect AD earlier in the prodromal phase. However, we accept these results could also represent a marker of diseases sharing common etiology.InnoMed, European Union of the Sixth Framework programAlzheimer’s Research UKJohn and Lucille van Geest FoundationNIHRBiomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation TrustInstitute of Psychiatry Kings College LondonNIA/NIH RC
DM-PhyClus: A Bayesian phylogenetic algorithm for infectious disease transmission cluster inference
Background. Conventional phylogenetic clustering approaches rely on arbitrary
cutpoints applied a posteriori to phylogenetic estimates. Although in practice,
Bayesian and bootstrap-based clustering tend to lead to similar estimates, they
often produce conflicting measures of confidence in clusters. The current study
proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as
DM-PhyClus, that identifies sets of sequences resulting from quick transmission
chains, thus yielding easily-interpretable clusters, without using any ad hoc
distance or confidence requirement. Results. Simulations reveal that DM-PhyClus
can outperform conventional clustering methods, as well as the Gap procedure, a
pure distance-based algorithm, in terms of mean cluster recovery. We apply
DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters
whose inference is in line with the conclusions of a previous thorough
analysis. Conclusions. DM-PhyClus, by eliminating the need for cutpoints and
producing sensible inference for cluster configurations, can facilitate
transmission cluster detection. Future efforts to reduce incidence of
infectious diseases, like HIV-1, will need reliable estimates of transmission
clusters. It follows that algorithms like DM-PhyClus could serve to better
inform public health strategies
A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the UK Biobank using different primary care EHR and clinical terminology systems
Objectives:
The UK Biobank (UKB) is making primary care electronic health records (EHRs) for 500 000 participants available for COVID-19-related research. Data are extracted from four sources, recorded using five clinical terminologies and stored in different schemas. The aims of our research were to: (a) develop a semi-supervised approach for bootstrapping EHR phenotyping algorithms in UKB EHR, and (b) to evaluate our approach by implementing and evaluating phenotypes for 31 common biomarkers.
Materials and Methods:
We describe an algorithmic approach to phenotyping biomarkers in primary care EHR involving (a) bootstrapping definitions using existing phenotypes, (b) excluding generic, rare, or semantically distant terms, (c) forward-mapping terminology terms, (d) expert review, and (e) data extraction. We evaluated the phenotypes by assessing the ability to reproduce known epidemiological associations with all-cause mortality using Cox proportional hazards models.
Results:
We created and evaluated phenotyping algorithms for 31 biomarkers many of which are directly related to COVID-19 complications, for example diabetes, cardiovascular disease, respiratory disease. Our algorithm identified 1651 Read v2 and Clinical Terms Version 3 terms and automatically excluded 1228 terms. Clinical review excluded 103 terms and included 44 terms, resulting in 364 terms for data extraction (sensitivity 0.89, specificity 0.92). We extracted 38 190 682 events and identified 220 978 participants with at least one biomarker measured.
Discussion and conclusion:
Bootstrapping phenotyping algorithms from similar EHR can potentially address pre-existing methodological concerns that undermine the outputs of biomarker discovery pipelines and provide research-quality phenotyping algorithms
- …