Rigorous and interactive class discussions that support students to engage in
high-level thinking and reasoning are essential to learning and are a central
component of most teaching interventions. However, formally assessing
discussion quality 'at scale' is expensive and infeasible for most researchers.
In this work, we experimented with various modern natural language processing
(NLP) techniques to automatically generate rubric scores for individual
dimensions of classroom text discussion quality. Specifically, we worked on a
dataset of 90 classroom discussion transcripts consisting of over 18000 turns
annotated with fine-grained Analyzing Teaching Moves (ATM) codes and focused on
four Instructional Quality Assessment (IQA) rubrics. Despite the limited amount
of data, our work shows encouraging results in some of the rubrics while
suggesting that there is room for improvement in the others. We also found that
certain NLP approaches work better for certain rubrics.Comment: to be published in AIED 202