Search CORE

14 research outputs found

Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models

Author: Melamud Oren
Shivade Chaitanya
Publication venue
Publication date: 01/01/2019
Field of study

Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.Comment: Clinical NLP Workshop 201

arXiv.org e-Print Archive

Crossref

A Simple Language Model based on PMI Matrix Approximations

Author: Dagan Ido
Goldberger Jacob
Melamud Oren
Publication venue
Publication date: 01/01/2017
Field of study

In this study, we introduce a new approach for learning language models by training them to estimate word-context pointwise mutual information (PMI), and then deriving the desired conditional probabilities from PMI at test time. Specifically, we show that with minor modifications to word2vec's algorithm, we get principled language models that are closely related to the well-established Noise Contrastive Estimation (NCE) based language models. A compelling aspect of our approach is that our models are trained with the same simple negative sampling objective function that is commonly used in word2vec to learn word embeddings.Comment: Accepted to EMNLP 201

arXiv.org e-Print Archive

Crossref

Dotted interval graphs and high throughput genotyping

Author: Moshe Lewenstein
Oren Melamud
Ron Pinter
Yonatan Aumann
Zohar Yakhini
Publication venue
Publication date
Field of study

We introduce a generalization of interval graphs, which we call dotted interval graphs (DIG). A dotted interval graph is an intersection graph of arithmetic progressions (=dotted intervals). Coloring of dotted intervals graphs naturally arises in the context of high throughput genotyping. We study the properties of dotted interval graphs, with a focus on coloring. We show that any graph is a DIG but that DIGd graphs, i.e. DIGs in which the arithmetic progressions have a jump of at most d, form a strict hierarchy. We show that coloring DIGd graphs is NP-complete even for d = 2. For any fixed d, we provide a 7 8 d approximation for the coloring of DIGd graphs.

CiteSeerX