Comprehending and elucidating the purpose of code is often cited as being a
key learning objective within introductory programming courses. To address this
objective ``Explain-in-Plain-English'' questions, in which students are shown a
segment of code and asked to provide an abstract description of the code's
purpose, have been adopted. However, given EiPE questions require a natural
language response, they often require manual grading which is time-consuming
for course staff and delays feedback for students. With the advent of large
language models (LLMs) capable of generating code, responses to EiPE questions
can be used to generate code segments, the correctness of which can then be
easily verified using test cases. We refer to this approach as "Code Generation
Based Grading" (CGBG) and in this paper we explore its agreement with human
graders using EiPE responses from past exams in an introductory programming
course taught in Python. Overall, we find that CGBG achieves moderate agreement
with human graders with the primary area of disagreement being its leniency
with respect to low-level and line-by-line descriptions of code