76,031 research outputs found
Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models
Large-scale clinical data is invaluable to driving many computational
scientific advances today. However, understandable concerns regarding patient
privacy hinder the open dissemination of such data and give rise to suboptimal
siloed research. De-identification methods attempt to address these concerns
but were shown to be susceptible to adversarial attacks. In this work, we focus
on the vast amounts of unstructured natural language data stored in clinical
notes and propose to automatically generate synthetic clinical notes that are
more amenable to sharing using generative models trained on real de-identified
records. To evaluate the merit of such notes, we measure both their privacy
preservation properties as well as utility in training clinical NLP models.
Experiments using neural language models yield notes whose utility is close to
that of the real ones in some clinical NLP tasks, yet leave ample room for
future improvements.Comment: Clinical NLP Workshop 201
Depression and Self-Harm Risk Assessment in Online Forums
Users suffering from mental health conditions often turn to online resources
for support, including specialized online support communities or general
communities such as Twitter and Reddit. In this work, we present a neural
framework for supporting and studying users in both types of communities. We
propose methods for identifying posts in support communities that may indicate
a risk of self-harm, and demonstrate that our approach outperforms strong
previously proposed methods for identifying such posts. Self-harm is closely
related to depression, which makes identifying depressed users on general
forums a crucial related task. We introduce a large-scale general forum dataset
("RSDD") consisting of users with self-reported depression diagnoses matched
with control users. We show how our method can be applied to effectively
identify depressed users from their use of language alone. We demonstrate that
our method outperforms strong baselines on this general forum dataset.Comment: Expanded version of EMNLP17 paper. Added sections 6.1, 6.2, 6.4,
FastText baseline, and CNN-
Distinguishing Asthma Phenotypes Using Machine Learning Approaches.
Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as asthma endotypes. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies
Interpretable Subgroup Discovery in Treatment Effect Estimation with Application to Opioid Prescribing Guidelines
The dearth of prescribing guidelines for physicians is one key driver of the
current opioid epidemic in the United States. In this work, we analyze medical
and pharmaceutical claims data to draw insights on characteristics of patients
who are more prone to adverse outcomes after an initial synthetic opioid
prescription. Toward this end, we propose a generative model that allows
discovery from observational data of subgroups that demonstrate an enhanced or
diminished causal effect due to treatment. Our approach models these
sub-populations as a mixture distribution, using sparsity to enhance
interpretability, while jointly learning nonlinear predictors of the potential
outcomes to better adjust for confounding. The approach leads to
human-interpretable insights on discovered subgroups, improving the practical
utility for decision suppor
Bayesian cluster detection via adjacency modelling
Disease mapping aims to estimate the spatial pattern in disease risk across an area, identifying units which have elevated disease risk. Existing methods use Bayesian hierarchical models with spatially smooth conditional autoregressive priors to estimate risk, but these methods are unable to identify the geographical extent of spatially contiguous high-risk clusters of areal units. Our proposed solution to this problem is a two-stage approach, which produces a set of potential cluster structures for the data and then chooses the optimal structure via a Bayesian hierarchical model. The first stage uses a spatially adjusted hierarchical agglomerative clustering algorithm. The second stage fits a Poisson log-linear model to the data to estimate the optimal cluster structure and the spatial pattern in disease risk. The methodology was applied to a study of chronic obstructive pulmonary disease (COPD) in local authorities in England, where a number of high risk clusters were identified
Identifying Clusters in Bayesian Disease Mapping
Disease mapping is the field of spatial epidemiology interested in estimating
the spatial pattern in disease risk across areal units. One aim is to
identify units exhibiting elevated disease risks, so that public health
interventions can be made. Bayesian hierarchical models with a spatially smooth
conditional autoregressive prior are used for this purpose, but they cannot
identify the spatial extent of high-risk clusters. Therefore we propose a two
stage solution to this problem, with the first stage being a spatially adjusted
hierarchical agglomerative clustering algorithm. This algorithm is applied to
data prior to the study period, and produces potential cluster structures
for the disease data. The second stage fits a separate Poisson log-linear model
to the study data for each cluster structure, which allows for step-changes in
risk where two clusters meet. The most appropriate cluster structure is chosen
by model comparison techniques, specifically by minimising the Deviance
Information Criterion. The efficacy of the methodology is established by a
simulation study, and is illustrated by a study of respiratory disease risk in
Glasgow, Scotland
Defining and Estimating Intervention Effects for Groups that will Develop an Auxiliary Outcome
It has recently become popular to define treatment effects for subsets of the
target population characterized by variables not observable at the time a
treatment decision is made. Characterizing and estimating such treatment
effects is tricky; the most popular but naive approach inappropriately adjusts
for variables affected by treatment and so is biased. We consider several
appropriate ways to formalize the effects: principal stratification,
stratification on a single potential auxiliary variable, stratification on an
observed auxiliary variable and stratification on expected levels of auxiliary
variables. We then outline identifying assumptions for each type of estimand.
We evaluate the utility of these estimands and estimation procedures for
decision making and understanding causal processes, contrasting them with the
concepts of direct and indirect effects. We motivate our development with
examples from nephrology and cancer screening, and use simulated data and real
data on cancer screening to illustrate the estimation methods.Comment: Published at http://dx.doi.org/10.1214/088342306000000655 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- âŚ