SYNTHNOTES: TOWARDS SYNTHETIC CLINICAL TEXT GENERATION

Abstract

SynthNotes is a statistical natural language generation tool for the creation of realistic medical text notes for use by researchers in clinical language processing. Currently, advancements in medical analytics research face barriers due to patient privacy concerns which limits the numbers of researchers who have access to valuable data. Furthermore, privacy protections restrict the computing environments where data can be processed. This often adds prohibitive costs to researchers. The generation method described here provides domain-independent statistical methods for learning to generate text by extracting and ranking templates from a training corpus. The primary contribution in this work is automating the process of template selection and generation of text through classic machine learning methods. SynthNotes removes the need for human domain experts to construct templates, which can be time intensive and expensive. Furthermore, by using machine learning methods, this approach leads to greater realism and variability in the generated notes than could be achieved through classical language generation methods

    Similar works