Search CORE

76,031 research outputs found

Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models

Author: Melamud Oren
Shivade Chaitanya
Publication venue
Publication date: 01/01/2019
Field of study

Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.Comment: Clinical NLP Workshop 201

arXiv.org e-Print Archive

Crossref

Depression and Self-Harm Risk Assessment in Online Forums

Author: Cohan Arman
Goharian Nazli
Yates Andrew
Publication venue
Publication date: 01/01/2017
Field of study

Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.Comment: Expanded version of EMNLP17 paper. Added sections 6.1, 6.2, 6.4, FastText baseline, and CNN-

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

Author: A Custovic
A Custovic
A Fraser
A Høst
A Pickles
A Simpson
A Wijga
Adnan Custovic
AJ Lowe
AV Berg
B Clarisse
BD Spycher
BD Spycher
BD Spycher
BG Toelle
BL Jones
BL Jones
C-M Chen
CA Figueiredo
CE Kuehni
CJ Lodge
CL Storr
D Barber
D Belgrave
D Caudri
D Nagin
DA Linzer
DC Belgrave
DC Belgrave
DCM Belgrave
DCM Belgrave
F Kauffmann
F Kauffmann
FD Martinez
FL Garden
FP Perera
G Bochenek
G Weinmayr
GB Marks
GP Anderson
J Hagenaars
J Henderson
J Lotvall
J Magidson
J Sunyer
J Winn
JA Smith
JK Vermunt
K Burnham
KE Wonderen Van
KL Nylund
L García-Marcos Álvarez
L Hunt
L Lowe
L Panico
LA Lowe
M Depner
M Herr
M Scott
Magnus Rattray
Mattia Prosperi
MJ Ege
ML Barreto
MM Hagendorens
MW Pijnenburg
N Lazic
NC Nicolaou
NG Papadopoulos
OE Savenije
P Burney
P Haldar
P Rzehak
P Rzehak
PD Sly
Q Chen
Q Vuong
Rebecca Howard
RJP Valk van der
RL Bergmann
RL Miller
RO Crapo
RT Stein
S American Thoracic
S Havstad
S Mihrshahi
S Rabe-Hesketh
S Stanojevic
SE Wenzel
SK Weiland
ST Lanza
ST Lanza
T Jung
T Minka
The European Community Respiratory Health Survey
V Siroux
WC Moore
X Robin
Y Lo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as asthma endotypes. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies

Crossref

Springer - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

The University of Manchester - Institutional Repository

Interpretable Subgroup Discovery in Treatment Effect Estimation with Application to Opioid Prescribing Guidelines

Author: Che Zhengping
Crofford Leslie J.
Davis Mellar P.
Dusseldorp Elise
Gebhart G. F.
Harbaugh Calista M.
Johansson Fredrik
Kingma Diederik P
Marlin Benjamin M.
Parente Stephen T.
Patil Pravinkumar R.
Rubin Donald B.
Shalit Uri
Su Xiaogang
Zhang Jinghe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/03/2020
Field of study

The dearth of prescribing guidelines for physicians is one key driver of the current opioid epidemic in the United States. In this work, we analyze medical and pharmaceutical claims data to draw insights on characteristics of patients who are more prone to adverse outcomes after an initial synthetic opioid prescription. Toward this end, we propose a generative model that allows discovery from observational data of subgroups that demonstrate an enhanced or diminished causal effect due to treatment. Our approach models these sub-populations as a mixture distribution, using sparsity to enhance interpretability, while jointly learning nonlinear predictors of the potential outcomes to better adjust for confounding. The approach leads to human-interpretable insights on discovered subgroups, improving the practical utility for decision suppor

arXiv.org e-Print Archive

Crossref

Bayesian cluster detection via adjacency modelling

Author: Anderson Craig
Dean Nema
Lee Duncan
Publication venue: Elsevier
Publication date: 01/02/2016
Field of study

Disease mapping aims to estimate the spatial pattern in disease risk across an area, identifying units which have elevated disease risk. Existing methods use Bayesian hierarchical models with spatially smooth conditional autoregressive priors to estimate risk, but these methods are unable to identify the geographical extent of spatially contiguous high-risk clusters of areal units. Our proposed solution to this problem is a two-stage approach, which produces a set of potential cluster structures for the data and then chooses the optimal structure via a Bayesian hierarchical model. The first stage uses a spatially adjusted hierarchical agglomerative clustering algorithm. The second stage fits a Poisson log-linear model to the data to estimate the optimal cluster structure and the spatial pattern in disease risk. The methodology was applied to a study of chronic obstructive pulmonary disease (COPD) in local authorities in England, where a number of high risk clusters were identified

OPUS - University of Technology Sydney

Enlighten

Identifying Clusters in Bayesian Disease Mapping

Author: Anderson Craig
Dean Nema
Lee Duncan
Publication venue
Publication date: 04/11/2013
Field of study

Disease mapping is the field of spatial epidemiology interested in estimating the spatial pattern in disease risk across

n

areal units. One aim is to identify units exhibiting elevated disease risks, so that public health interventions can be made. Bayesian hierarchical models with a spatially smooth conditional autoregressive prior are used for this purpose, but they cannot identify the spatial extent of high-risk clusters. Therefore we propose a two stage solution to this problem, with the first stage being a spatially adjusted hierarchical agglomerative clustering algorithm. This algorithm is applied to data prior to the study period, and produces

n

potential cluster structures for the disease data. The second stage fits a separate Poisson log-linear model to the study data for each cluster structure, which allows for step-changes in risk where two clusters meet. The most appropriate cluster structure is chosen by model comparison techniques, specifically by minimising the Deviance Information Criterion. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland

arXiv.org e-Print Archive

CiteSeerX

OPUS - University of Technology Sydney

Enlighten

Defining and Estimating Intervention Effects for Groups that will Develop an Auxiliary Outcome

Author: Hsu Chi-Yuan
Joffe Marshall M.
Small Dylan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

It has recently become popular to define treatment effects for subsets of the target population characterized by variables not observable at the time a treatment decision is made. Characterizing and estimating such treatment effects is tricky; the most popular but naive approach inappropriately adjusts for variables affected by treatment and so is biased. We consider several appropriate ways to formalize the effects: principal stratification, stratification on a single potential auxiliary variable, stratification on an observed auxiliary variable and stratification on expected levels of auxiliary variables. We then outline identifying assumptions for each type of estimand. We evaluate the utility of these estimands and estimation procedures for decision making and understanding causal processes, contrasting them with the concepts of direct and indirect effects. We motivate our development with examples from nephrology and cancer screening, and use simulated data and real data on cancer screening to illustrate the estimation methods.Comment: Published at http://dx.doi.org/10.1214/088342306000000655 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

ScholarlyCommons@Penn