17,463 research outputs found
Semantically Enhanced Dynamic Bayesian Network for Detecting Sepsis Mortality Risk in ICU Patients with Infection
Although timely sepsis diagnosis and prompt interventions in Intensive Care
Unit (ICU) patients are associated with reduced mortality, early clinical
recognition is frequently impeded by non-specific signs of infection and
failure to detect signs of sepsis-induced organ dysfunction in a constellation
of dynamically changing physiological data. The goal of this work is to
identify patient at risk of life-threatening sepsis utilizing a data-centered
and machine learning-driven approach. We derive a mortality risk predictive
dynamic Bayesian network (DBN) guided by a customized sepsis knowledgebase and
compare the predictive accuracy of the derived DBN with the Sepsis-related
Organ Failure Assessment (SOFA) score, the Quick SOFA (qSOFA) score, the
Simplified Acute Physiological Score (SAPS-II) and the Modified Early Warning
Score (MEWS) tools.
A customized sepsis ontology was used to derive the DBN node structure and
semantically characterize temporal features derived from both structured
physiological data and unstructured clinical notes. We assessed the performance
in predicting mortality risk of the DBN predictive model and compared
performance to other models using Receiver Operating Characteristic (ROC)
curves, area under curve (AUROC), calibration curves, and risk distributions.
The derived dataset consists of 24,506 ICU stays from 19,623 patients with
evidence of suspected infection, with 2,829 patients deceased at discharge. The
DBN AUROC was found to be 0.91, which outperformed the SOFA (0.843), qSOFA
(0.66), MEWS (0.73), and SAPS-II (0.77) scoring tools. Continuous Net
Reclassification Index and Integrated Discrimination Improvement analysis
supported the superiority DBN. Compared with conventional rule-based risk
scoring tools, the sepsis knowledgebase-driven DBN algorithm offers improved
performance for predicting mortality of infected patients in ICUs
EigenEvent: An Algorithm for Event Detection from Complex Data Streams in Syndromic Surveillance
Syndromic surveillance systems continuously monitor multiple pre-diagnostic
daily streams of indicators from different regions with the aim of early
detection of disease outbreaks. The main objective of these systems is to
detect outbreaks hours or days before the clinical and laboratory confirmation.
The type of data that is being generated via these systems is usually
multivariate and seasonal with spatial and temporal dimensions. The algorithm
What's Strange About Recent Events (WSARE) is the state-of-the-art method for
such problems. It exhaustively searches for contrast sets in the multivariate
data and signals an alarm when find statistically significant rules. This
bottom-up approach presents a much lower detection delay comparing the existing
top-down approaches. However, WSARE is very sensitive to the small-scale
changes and subsequently comes with a relatively high rate of false alarms. We
propose a new approach called EigenEvent that is neither fully top-down nor
bottom-up. In this method, we instead of top-down or bottom-up search, track
changes in data correlation structure via eigenspace techniques. This new
methodology enables us to detect both overall changes (via eigenvalue) and
dimension-level changes (via eigenvectors). Experimental results on hundred
sets of benchmark data reveals that EigenEvent presents a better overall
performance comparing state-of-the-art, in particular in terms of the false
alarm rate.Comment: To appear in Intelligent Data Analysis Journal, vol. 19(3), 201
EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning
Objective: Electronic medical records (EMRs) contain an amount of medical
knowledge which can be used for clinical decision support (CDS). Our objective
is a general system that can extract and represent these knowledge contained in
EMRs to support three CDS tasks: test recommendation, initial diagnosis, and
treatment plan recommendation, with the given condition of one patient.
Methods: We extracted four kinds of medical entities from records and
constructed an EMR-based medical knowledge network (EMKN), in which nodes are
entities and edges reflect their co-occurrence in a single record. Three
bipartite subgraphs (bi-graphs) were extracted from the EMKN to support each
task. One part of the bi-graph was the given condition (e.g., symptoms), and
the other was the condition to be inferred (e.g., diseases). Each bi-graph was
regarded as a Markov random field to support the inference. Three lazy energy
functions and one parameter-based energy function were proposed, as well as two
knowledge representation learning-based energy functions, which can provide a
distributed representation of medical entities. Three measures were utilized
for performance evaluation. Results: On the initial diagnosis task, 80.11% of
the test records identified at least one correct disease from top 10
candidates. Test and treatment recommendation results were 87.88% and 92.55%,
respectively. These results altogether indicate that the proposed system
outperformed the baseline methods. The distributed representation of medical
entities does reflect similarity relationships in regards to knowledge level.
Conclusion: Combining EMKN and MRF is an effective approach for general medical
knowledge representation and inference. Different tasks, however, require
designing their energy functions individually
A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop
The goal of Machine Learning to automatically learn from data, extract
knowledge and to make decisions without any human intervention. Such automatic
(aML) approaches show impressive success. Recent results even demonstrate
intriguingly that deep learning applied for automatic classification of skin
lesions is on par with the performance of dermatologists, yet outperforms the
average. As human perception is inherently limited, such approaches can
discover patterns, e.g. that two objects are similar, in arbitrarily
high-dimensional spaces what no human is able to do. Humans can deal only with
limited amounts of data, whilst big data is beneficial for aML; however, in
health informatics, we are often confronted with a small number of data sets,
where aML suffer of insufficient training samples and many problems are
computationally hard. Here, interactive machine learning (iML) may be of help,
where a human-in-the-loop contributes to reduce the complexity of NP-hard
problems. A further motivation for iML is that standard black-box approaches
lack transparency, hence do not foster trust and acceptance of ML among
end-users. Rising legal and privacy aspects, e.g. with the new European General
Data Protection Regulations, make black-box approaches difficult to use,
because they often are not able to explain why a decision has been made. In
this paper, we present some experiments to demonstrate the effectiveness of the
human-in-the-loop approach, particularly in opening the black-box to a
glass-box and thus enabling a human directly to interact with an learning
algorithm. We selected the Ant Colony Optimization framework, and applied it on
the Traveling Salesman Problem, which is a good example, due to its relevance
for health informatics, e.g. for the study of protein folding. From studies of
how humans extract so much from so little data, fundamental ML-research also
may benefit.Comment: 26 pages, 5 figure
Early Stage Influenza Detection from Twitter
Influenza is an acute respiratory illness that occurs virtually every year
and results in substantial disease, death and expense. Detection of Influenza
in its earliest stage would facilitate timely action that could reduce the
spread of the illness. Existing systems such as CDC and EISS which try to
collect diagnosis data, are almost entirely manual, resulting in about two-week
delays for clinical data acquisition. Twitter, a popular microblogging service,
provides us with a perfect source for early-stage flu detection due to its
real- time nature. For example, when a flu breaks out, people that get the flu
may post related tweets which enables the detection of the flu breakout
promptly. In this paper, we investigate the real-time flu detection problem on
Twitter data by proposing Flu Markov Network (Flu-MN): a spatio-temporal
unsupervised Bayesian algorithm based on a 4 phase Markov Network, trying to
identify the flu breakout at the earliest stage. We test our model on real
Twitter datasets from the United States along with baselines in multiple
applications, such as real-time flu breakout detection, future epidemic phase
prediction, or Influenza-like illness (ILI) physician visits. Experimental
results show the robustness and effectiveness of our approach. We build up a
real time flu reporting system based on the proposed approach, and we are
hopeful that it would help government or health organizations in identifying
flu outbreaks and facilitating timely actions to decrease unnecessary
mortality
FWDA: a Fast Wishart Discriminant Analysis with its Application to Electronic Health Records Data Classification
Linear Discriminant Analysis (LDA) on Electronic Health Records (EHR) data is
widely-used for early detection of diseases. Classical LDA for EHR data
classification, however, suffers from two handicaps: the ill-posed estimation
of LDA parameters (e.g., covariance matrix), and the "linear inseparability" of
EHR data. To handle these two issues, in this paper, we propose a novel
classifier FWDA -- Fast Wishart Discriminant Analysis, that makes predictions
in an ensemble way. Specifically, FWDA first surrogates the distribution of
inverse covariance matrices using a Wishart distribution estimated from the
training data, then "weighted-averages" the classification results of multiple
LDA classifiers parameterized by the sampled inverse covariance matrices via a
Bayesian Voting scheme. The weights for voting are optimally updated to adapt
each new input data, so as to enable the nonlinear classification. Theoretical
analysis indicates that FWDA possesses a fast convergence rate and a robust
performance on high dimensional data. Extensive experiments on large-scale EHR
dataset show that our approach outperforms state-of-the-art algorithms by a
large margin
Time Series Imputation
Multivariate time series is a very active topic in the research community and
many machine learning tasks are being used in order to extract information from
this type of data. However, in real-world problems data has missing values,
which may difficult the application of machine learning techniques to extract
information. In this paper we focus on the task of imputation of time series.
Many imputation methods for time series are based on regression methods.
Unfortunately, these methods perform poorly when the variables are categorical.
To address this case, we propose a new imputation method based on Expectation
Maximization over dynamic Bayesian networks. The approach is assessed with
synthetic and real data, and it outperforms several state-of-the art methods.Comment: Master paper, draft to be submitte
Evaluation of Predictive Data Mining Algorithms in Erythemato-Squamous Disease Diagnosis
A lot of time is spent searching for the most performing data mining
algorithms applied in clinical diagnosis. The study set out to identify the
most performing predictive data mining algorithms applied in the diagnosis of
Erythemato-squamous diseases. The study used Naive Bayes, Multilayer Perceptron
and J48 decision tree induction to build predictive data mining models on 366
instances of Erythemato-squamous diseases datasets. Also, 10-fold
cross-validation and sets of performance metrics were used to evaluate the
baseline predictive performance of the classifiers. The comparative analysis
shows that the Naive Bayes performed best with accuracy of 97.4%, Multilayer
Perceptron came out second with accuracy of 96.6%, and J48 came out the worst
with accuracy of 93.5%. The evaluation of these classifiers on clinical
datasets, gave an insight into the predictive ability of different data mining
algorithms applicable in clinical diagnosis especially in the diagnosis of
Erythemato-squamous diseases.Comment: 10 pages, 3 figures 2 table
A New Approach to Adaptive Signal Processing
A unified linear algebraic approach to adaptive signal processing (ASP) is
presented. Starting from just Ax=b, key ASP algorithms are derived in a simple,
systematic, and integrated manner without requiring any background knowledge to
the field. Algorithms covered are Steepest Descent, LMS, Normalized LMS,
Kaczmarz, Affine Projection, RLS, Kalman filter, and MMSE/Least Square Wiener
filters. By following this approach, readers will discover a synthesis; they
will learn that one and only one equation is involved in all these algorithms.
They will also learn that this one equation forms the basis of more advanced
algorithms like reduced rank adaptive filters, extended Kalman filter, particle
filters, multigrid methods, preconditioning methods, Krylov subspace methods
and conjugate gradients. This will enable them to enter many sophisticated
realms of modern research and development. Eventually, this one equation will
not only become their passport to ASP but also to many highly specialized areas
of computational science and engineering
Simulation-Based Inference for Global Health Decisions
The COVID-19 pandemic has highlighted the importance of in-silico
epidemiological modelling in predicting the dynamics of infectious diseases to
inform health policy and decision makers about suitable prevention and
containment strategies. Work in this setting involves solving challenging
inference and control problems in individual-based models of ever increasing
complexity. Here we discuss recent breakthroughs in machine learning,
specifically in simulation-based inference, and explore its potential as a
novel venue for model calibration to support the design and evaluation of
public health interventions. To further stimulate research, we are developing
software interfaces that turn two cornerstone COVID-19 and malaria epidemiology
models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria
(https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling
efficient interpretable Bayesian inference within those simulators
- …