The accurate classification of student help requests with respect to the type
of help being sought can enable the tailoring of effective responses.
Automatically classifying such requests is non-trivial, but large language
models (LLMs) appear to offer an accessible, cost-effective solution. This
study evaluates the performance of the GPT-3.5 and GPT-4 models for classifying
help requests from students in an introductory programming class. In zero-shot
trials, GPT-3.5 and GPT-4 exhibited comparable performance on most categories,
while GPT-4 outperformed GPT-3.5 in classifying sub-categories for requests
related to debugging. Fine-tuning the GPT-3.5 model improved its performance to
such an extent that it approximated the accuracy and consistency across
categories observed between two human raters. Overall, this study demonstrates
the feasibility of using LLMs to enhance educational systems through the
automated classification of student needs