6,050 research outputs found
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Predicting Chronic Disease Hospitalizations from Electronic Health Records: An Interpretable Classification Approach
Urban living in modern large cities has significant adverse effects on
health, increasing the risk of several chronic diseases. We focus on the two
leading clusters of chronic disease, heart disease and diabetes, and develop
data-driven methods to predict hospitalizations due to these conditions. We
base these predictions on the patients' medical history, recent and more
distant, as described in their Electronic Health Records (EHR). We formulate
the prediction problem as a binary classification problem and consider a
variety of machine learning methods, including kernelized and sparse Support
Vector Machines (SVM), sparse logistic regression, and random forests. To
strike a balance between accuracy and interpretability of the prediction, which
is important in a medical setting, we propose two novel methods: K-LRT, a
likelihood ratio test-based method, and a Joint Clustering and Classification
(JCC) method which identifies hidden patient clusters and adapts classifiers to
each cluster. We develop theoretical out-of-sample guarantees for the latter
method. We validate our algorithms on large datasets from the Boston Medical
Center, the largest safety-net hospital system in New England
- …