Pain is a common reason for accessing healthcare resources and is a growing
area of research, especially in its overlap with mental health. Mental health
electronic health records are a good data source to study this overlap.
However, much information on pain is held in the free text of these records,
where mentions of pain present a unique natural language processing problem due
to its ambiguous nature. This project uses data from an anonymised mental
health electronic health records database. The data are used to train a machine
learning based classification algorithm to classify sentences as discussing
patient pain or not. This will facilitate the extraction of relevant pain
information from large databases, and the use of such outputs for further
studies on pain and mental health. 1,985 documents were manually
triple-annotated for creation of gold standard training data, which was used to
train three commonly used classification algorithms. The best performing model
achieved an F1-score of 0.98 (95% CI 0.98-0.99).Comment: 5 pages, 2 tables, submitted to MEDINFO 2023 conferenc