1 research outputs found
Topic Modeling for Classification of Clinical Reports
Electronic health records (EHRs) contain important clinical information about
patients. Efficient and effective use of this information could supplement or
even replace manual chart review as a means of studying and improving the
quality and safety of healthcare delivery. However, some of these clinical data
are in the form of free text and require pre-processing before use in automated
systems. A common free text data source is radiology reports, typically
dictated by radiologists to explain their interpretations. We sought to
demonstrate machine learning classification of computed tomography (CT) imaging
reports into binary outcomes, i.e. positive and negative for fracture, using
regular text classification and classifiers based on topic modeling. Topic
modeling provides interpretable themes (topic distributions) in reports, a
representation that is more compact than the commonly used bag-of-words
representation and can be processed faster than raw text in subsequent
automated processes. We demonstrate new classifiers based on this topic
modeling representation of the reports. Aggregate topic classifier (ATC) and
confidence-based topic classifier (CTC) use a single topic that is determined
from the training dataset based on different measures to classify the reports
on the test dataset. Alternatively, similarity-based topic classifier (STC)
measures the similarity between the reports' topic distributions to determine
the predicted class. Our proposed topic modeling-based classifier systems are
shown to be competitive with existing text classification techniques and
provides an efficient and interpretable representation.Comment: 18 page