1 research outputs found
Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan
Automated summarization of clinical texts can reduce the burden of medical
professionals. "Discharge summaries" are one promising application of the
summarization, because they can be generated from daily inpatient records. Our
preliminary experiment suggests that 20-31% of the descriptions in discharge
summaries overlap with the content of the inpatient records. However, it
remains unclear how the summaries should be generated from the unstructured
source. To decompose the physician's summarization process, this study aimed to
identify the optimal granularity in summarization. We first defined three types
of summarization units with different granularities to compare the performance
of the discharge summary generation: whole sentences, clinical segments, and
clauses. We defined clinical segments in this study, aiming to express the
smallest medically meaningful concepts. To obtain the clinical segments, it was
necessary to automatically split the texts in the first stage of the pipeline.
Accordingly, we compared rule-based methods and a machine learning method, and
the latter outperformed the formers with an F1 score of 0.846 in the splitting
task. Next, we experimentally measured the accuracy of extractive summarization
using the three types of units, based on the ROUGE-1 metric, on a
multi-institutional national archive of health records in Japan. The measured
accuracies of extractive summarization using whole sentences, clinical
segments, and clauses were 31.91, 36.15, and 25.18, respectively. We found that
the clinical segments yielded higher accuracy than sentences and clauses. This
result indicates that summarization of inpatient records demands finer
granularity than sentence-oriented processing. Although we used only Japanese
health records, it can be interpreted as follows: physicians extract "concepts
of medical significance" from patient records and recombine them ..