2,277 research outputs found
Multi-layer Representation Learning for Medical Concepts
Learning efficient representations for concepts has been proven to be an
important basis for many applications such as machine translation or document
classification. Proper representations of medical concepts such as diagnosis,
medication, procedure codes and visits will have broad applications in
healthcare analytics. However, in Electronic Health Records (EHR) the visit
sequences of patients include multiple concepts (diagnosis, procedure, and
medication codes) per visit. This structure provides two types of relational
information, namely sequential order of visits and co-occurrence of the codes
within each visit. In this work, we propose Med2Vec, which not only learns
distributed representations for both medical codes and visits from a large EHR
dataset with over 3 million visits, but also allows us to interpret the learned
representations confirmed positively by clinical experts. In the experiments,
Med2Vec displays significant improvement in key medical applications compared
to popular baselines such as Skip-gram, GloVe and stacked autoencoder, while
providing clinically meaningful interpretation
Knowledge Transfer with Medical Language Embeddings
Identifying relationships between concepts is a key aspect of scientific
knowledge synthesis. Finding these links often requires a researcher to
laboriously search through scien- tific papers and databases, as the size of
these resources grows ever larger. In this paper we describe how distributional
semantics can be used to unify structured knowledge graphs with unstructured
text to predict new relationships between medical concepts, using a
probabilistic generative model. Our approach is also designed to ameliorate
data sparsity and scarcity issues in the medical domain, which make language
modelling more challenging. Specifically, we integrate the medical relational
database (SemMedDB) with text from electronic health records (EHRs) to perform
knowledge graph completion. We further demonstrate the ability of our model to
predict relationships between tokens not appearing in the relational database.Comment: 6 pages, 2 figures, to appear at SDM-DMMH 201
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis
The past decade has seen an explosion in the amount of digital information
stored in electronic health records (EHR). While primarily designed for
archiving patient clinical information and administrative healthcare tasks,
many researchers have found secondary use of these records for various clinical
informatics tasks. Over the same period, the machine learning community has
seen widespread advances in deep learning techniques, which also have been
successfully applied to the vast amount of EHR data. In this paper, we review
these deep EHR systems, examining architectures, technical aspects, and
clinical applications. We also identify shortcomings of current techniques and
discuss avenues of future research for EHR-based deep learning.Comment: Accepted for publication with Journal of Biomedical and Health
Informatics: http://ieeexplore.ieee.org/abstract/document/8086133
Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction
Objective: To transform heterogeneous clinical data from electronic health
records into clinically meaningful constructed features using data driven
method that rely, in part, on temporal relations among data. Materials and
Methods: The clinically meaningful representations of medical concepts and
patients are the key for health analytic applications. Most of existing
approaches directly construct features mapped to raw data (e.g., ICD or CPT
codes), or utilize some ontology mapping such as SNOMED codes. However, none of
the existing approaches leverage EHR data directly for learning such concept
representation. We propose a new way to represent heterogeneous medical
concepts (e.g., diagnoses, medications and procedures) based on co-occurrence
patterns in longitudinal electronic health records. The intuition behind the
method is to map medical concepts that are co-occuring closely in time to
similar concept vectors so that their distance will be small. We also derive a
simple method to construct patient vectors from the related medical concept
vectors. Results: For qualitative evaluation, we study similar medical concepts
across diagnosis, medication and procedure. In quantitative evaluation, our
proposed representation significantly improves the predictive modeling
performance for onset of heart failure (HF), where classification methods (e.g.
logistic regression, neural network, support vector machine and K-nearest
neighbors) achieve up to 23% improvement in area under the ROC curve (AUC)
using this proposed representation. Conclusion: We proposed an effective method
for patient and medical concept representation learning. The resulting
representation can map relevant concepts together and also improves predictive
modeling performance.Comment: 45 page
Unsupervised Extraction of Phenotypes from Cancer Clinical Notes for Association Studies
The recent adoption of Electronic Health Records (EHRs) by health care
providers has introduced an important source of data that provides detailed and
highly specific insights into patient phenotypes over large cohorts. These
datasets, in combination with machine learning and statistical approaches,
generate new opportunities for research and clinical care. However, many
methods require the patient representations to be in structured formats, while
the information in the EHR is often locked in unstructured texts designed for
human readability. In this work, we develop the methodology to automatically
extract clinical features from clinical narratives from large EHR corpora
without the need for prior knowledge. We consider medical terms and sentences
appearing in clinical narratives as atomic information units. We propose an
efficient clustering strategy suitable for the analysis of large text corpora
and to utilize the clusters to represent information about the patient
compactly. To demonstrate the utility of our approach, we perform an
association study of clinical features with somatic mutation profiles from
4,007 cancer patients and their tumors. We apply the proposed algorithm to a
dataset consisting of about 65 thousand documents with a total of about 3.2
million sentences. We identify 341 significant statistical associations between
the presence of somatic mutations and clinical features. We annotated these
associations according to their novelty, and report several known associations.
We also propose 32 testable hypotheses where the underlying biological
mechanism does not appear to be known but plausible. These results illustrate
that the automated discovery of clinical features is possible and the joint
analysis of clinical and genetic datasets can generate appealing new
hypotheses
Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records
The rapid growth of Electronic Health Records (EHRs), as well as the
accompanied opportunities in Data-Driven Healthcare (DDH), has been attracting
widespread interests and attentions. Recent progress in the design and
applications of deep learning methods has shown promising results and is
forcing massive changes in healthcare academia and industry, but most of these
methods rely on massive labeled data. In this work, we propose a general deep
learning framework which is able to boost risk prediction performance with
limited EHR data. Our model takes a modified generative adversarial network
namely ehrGAN, which can provide plausible labeled EHR data by mimicking real
patient records, to augment the training dataset in a semi-supervised learning
manner. We use this generative model together with a convolutional neural
network (CNN) based prediction model to improve the onset prediction
performance. Experiments on two real healthcare datasets demonstrate that our
proposed framework produces realistic data samples and achieves significant
improvements on classification tasks with the generated data over several
stat-of-the-art baselines.Comment: To appear in ICDM 2017. This is the full version of paper with 8
page
ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
Clinical notes contain information about patients that goes beyond structured
data like lab values and medications. However, clinical notes have been
underused relative to structured data, because notes are high-dimensional and
sparse. This work develops and evaluates representations of clinical notes
using bidirectional transformers (ClinicalBERT). ClinicalBERT uncovers
high-quality relationships between medical concepts as judged by humans.
ClinicalBert outperforms baselines on 30-day hospital readmission prediction
using both discharge summaries and the first few days of notes in the intensive
care unit. Code and model parameters are available.Comment: CHIL 2020 Worksho
Learning Hierarchical Representations of Electronic Health Records for Clinical Outcome Prediction
Clinical outcome prediction based on the Electronic Health Record (EHR) plays
a crucial role in improving the quality of healthcare. Conventional deep
sequential models fail to capture the rich temporal patterns encoded in the
longand irregular clinical event sequences. We make the observation that
clinical events at a long time scale exhibit strongtemporal patterns, while
events within a short time period tend to be disordered co-occurrence. We thus
propose differentiated mechanisms to model clinical events at different time
scales. Our model learns hierarchical representationsof event sequences, to
adaptively distinguish between short-range and long-range events, and
accurately capture coretemporal dependencies. Experimental results on real
clinical data show that our model greatly improves over previous
state-of-the-art models, achieving AUC scores of 0.94 and 0.90 for predicting
death and ICU admission respectively, Our model also successfully identifies
important events for different clinical outcome prediction tasksComment: 10 pages, 2 figures, accepted by AMIA annual symposiu
Inpatient2Vec: Medical Representation Learning for Inpatients
Representation learning (RL) plays an important role in extracting proper
representations from complex medical data for various analyzing tasks, such as
patient grouping, clinical endpoint prediction and medication recommendation.
Medical data can be divided into two typical categories, outpatient and
inpatient, that have different data characteristics. However, few of existing
RL methods are specially designed for inpatients data, which have strong
temporal relations and consistent diagnosis. In addition, for unordered medical
activity set, existing medical RL methods utilize a simple pooling strategy,
which would result in indistinguishable contributions among the activities for
learning. In this work, weproposeInpatient2Vec, anovelmodel for learning three
kinds of representations for inpatient, including medical activity, hospital
day and diagnosis. A multi-layer self-attention mechanism with two training
tasks is designed to capture the inpatient data characteristics and process the
unordered set. Using a real-world dataset, we demonstrate that the proposed
approach outperforms the competitive baselines on semantic similarity
measurement and clinical events prediction tasks
Bidirectional Recurrent Neural Networks for Medical Event Detection in Electronic Health Records
Sequence labeling for extraction of medical events and their attributes from
unstructured text in Electronic Health Record (EHR) notes is a key step towards
semantic understanding of EHRs. It has important applications in health
informatics including pharmacovigilance and drug surveillance. The state of the
art supervised machine learning models in this domain are based on Conditional
Random Fields (CRFs) with features calculated from fixed context windows. In
this application, we explored various recurrent neural network frameworks and
show that they significantly outperformed the CRF models.Comment: In proceedings of NAACL HLT 201
- …