5 research outputs found
A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs
A large fraction of the electronic health records (EHRs) consists of clinical
measurements collected over time, such as lab tests and vital signs, which
provide important information about a patient's health status. These sequences
of clinical measurements are naturally represented as time series,
characterized by multiple variables and large amounts of missing data, which
complicate the analysis. In this work, we propose a novel kernel which is
capable of exploiting both the information from the observed values as well the
information hidden in the missing patterns in multivariate time series (MTS)
originating e.g. from EHRs. The kernel, called TCK, is designed using an
ensemble learning strategy in which the base models are novel mixed mode
Bayesian mixture models which can effectively exploit informative missingness
without having to resort to imputation methods. Moreover, the ensemble approach
ensures robustness to hyperparameters and therefore TCK is particularly
well suited if there is a lack of labels - a known challenge in medical
applications. Experiments on three real-world clinical datasets demonstrate the
effectiveness of the proposed kernel.Comment: 2020 International Workshop on Health Intelligence, AAAI-20. arXiv
admin note: text overlap with arXiv:1907.0525
Noisy multi-label semi-supervised dimensionality reduction
Noisy labeled data represent a rich source of information that often are
easily accessible and cheap to obtain, but label noise might also have many
negative consequences if not accounted for. How to fully utilize noisy labels
has been studied extensively within the framework of standard supervised
machine learning over a period of several decades. However, very little
research has been conducted on solving the challenge posed by noisy labels in
non-standard settings. This includes situations where only a fraction of the
samples are labeled (semi-supervised) and each high-dimensional sample is
associated with multiple labels. In this work, we present a novel
semi-supervised and multi-label dimensionality reduction method that
effectively utilizes information from both noisy multi-labels and unlabeled
data. With the proposed Noisy multi-label semi-supervised dimensionality
reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled
data are labeled simultaneously via a specially designed label propagation
algorithm. NMLSDR then learns a projection matrix for reducing the
dimensionality by maximizing the dependence between the enlarged and denoised
multi-label space and the features in the projected space. Extensive
experiments on synthetic data, benchmark datasets, as well as a real-world case
study, demonstrate the effectiveness of the proposed algorithm and show that it
outperforms state-of-the-art multi-label feature extraction algorithms.Comment: 38 page
Using anchors from free text in electronic health records to diagnose postoperative delirium
Objectives:
Postoperative
delirium
is
a
common
complication
after
major
surgery
among
the
elderly.
Despite
its
potentially
serious
consequences,
the
complication
often
goes
undetected
and
undiagnosed.
In
order
to
provide
diagnosis
support
one
could
potentially
exploit
the
information
hidden
in
free
text
documents
from
electronic
health
records
using
data-driven
clinical
decision
support
tools.
However,
these
tools
depend
on
labeled
training
data
and
can
be
both
time
consuming
and
expensive
to
create.
Methods:
The
recent
learning
with
anchors
framework
resolves
this
problem
by
transforming
key
observations
(anchors)
into
labels.
This
is
a
promising
framework,
but
it
is
heavily
reliant
on
clinicians
knowledge
for
specifying
good
anchor
choices
in
order
to
perform
well.
In
this
paper
we
propose
a
novel
method
for
specifying
anchors
from
free
text
documents,
following
an
exploratory
data
analysis
approach
based
on
clustering
and
data
visualization
techniques.
We
investigate
the
use
of
the
new
framework
as
a
way
to
detect
postoperative
delirium.
Results:
By
applying
the
proposed
method
to
medical
data
gathered
from
a
Norwegian
university
hospital,
we
increase
the
area
under
the
precision-recall
curve
from
0.51
to
0.96
compared
to
baselines.
Conclusions:
The
proposed
approach
can
be
used
as
a
framework
for
clinical
decision
support
for
postoperative
deliriu