14 research outputs found
Investigating the Essential of Meaningful Automated Formative Feedback for Programming Assignments
This study investigated the essential of meaningful automated feedback for
programming assignments. Three different types of feedback were tested,
including (a) What's wrong - what test cases were testing and which failed, (b)
Gap - comparisons between expected and actual outputs, and (c) Hint - hints on
how to fix problems if test cases failed. 46 students taking a CS2 participated
in this study. They were divided into three groups, and the feedback
configurations for each group were different: (1) Group One - What's wrong, (2)
Group Two - What's wrong + Gap, (3) Group Three - What's wrong + Gap + Hint.
This study found that simply knowing what failed did not help students
sufficiently, and might stimulate system gaming behavior. Hints were not found
to be impactful on student performance or their usage of automated feedback.
Based on the findings, this study provides practical guidance on the design of
automated feedback
ItsSQL: Intelligent Tutoring System for SQL
SQL is a central component of any database course. Despite the small number
of SQL commands, students struggle to practice the concepts. To overcome this
challenge, we developed an intelligent tutoring system (ITS) to guide the
learning process with a small effort by the lecturer. Other systems often give
only basic feedback (correct or incorrect) or require hundreds of instance
specific rules defined by a lecturer. In contrast, our system can provide
individual feedback based on a semi-automatically/intelligent growing pool of
reference solutions, i.e., sensible approaches. Moreover, we introduced the
concept of good and bad reference solutions. The system was developed and
evaluated in three steps based on Design Science research guidelines. The
results of the study demonstrate that providing multiple reference solutions
are useful with the support of harmonization to provide individual and
real-time feedback and thus improve the learning process for students
Automatically Repairing Programs Using Both Tests and Bug Reports
The success of automated program repair (APR) depends significantly on its
ability to localize the defects it is repairing. For fault localization (FL),
APR tools typically use either spectrum-based (SBFL) techniques that use test
executions or information-retrieval-based (IRFL) techniques that use bug
reports. These two approaches often complement each other, patching different
defects. No existing repair tool uses both SBFL and IRFL. We develop RAFL
(Rank-Aggregation-Based Fault Localization), a novel FL approach that combines
multiple FL techniques. We also develop Blues, a new IRFL technique that uses
bug reports, and an unsupervised approach to localize defects. On a dataset of
818 real-world defects, SBIR (combined SBFL and Blues) consistently localizes
more bugs and ranks buggy statements higher than the two underlying techniques.
For example, SBIR correctly identifies a buggy statement as the most suspicious
for 18.1% of the defects, while SBFL does so for 10.9% and Blues for 3.1%. We
extend SimFix, a state-of-the-art APR tool, to use SBIR, SBFL, and Blues.
SimFix using SBIR patches 112 out of the 818 defects; 110 when using SBFL, and
55 when using Blues. The 112 patched defects include 55 defects patched
exclusively using SBFL, 7 patched exclusively using IRFL, 47 patched using both
SBFL and IRFL and 3 new defects. SimFix using Blues significantly outperforms
iFixR, the state-of-the-art IRFL-based APR tool. Overall, SimFix using our FL
techniques patches ten defects no prior tools could patch. By evaluating on a
benchmark of 818 defects, 442 previously unused in APR evaluations, we find
that prior evaluations on the overused Defects4J benchmark have led to overly
generous findings. Our paper is the first to (1) use combined FL for APR, (2)
apply a more rigorous methodology for measuring patch correctness, and (3)
evaluate on the new, substantially larger version of Defects4J.Comment: working pape
Verifix: Verified Repair of Programming Assignments
Automated feedback generation for introductory programming assignments is
useful for programming education. Most works try to generate feedback to
correct a student program by comparing its behavior with an instructor's
reference program on selected tests. In this work, our aim is to generate
verifiably correct program repairs as student feedback. The student assignment
is aligned and composed with a reference solution in terms of control flow, and
differences in data variables are automatically summarized via predicates to
relate the variable names. Failed verification attempts for the equivalence of
the two programs are exploited to obtain a collection of maxSMT queries, whose
solutions point to repairs of the student assignment. We have conducted
experiments on student assignments curated from a widely deployed intelligent
tutoring system. Our results indicate that we can generate verified feedback in
up to 58% of the assignments. More importantly, our system indicates when it is
able to generate a verified feedback, which is then usable by novice students
with high confidence