38 research outputs found
Deduplicating and Ranking Solution Programs for Suggesting Reference Solutions
Referring to solution programs written by other users is helpful for learners
in programming education. However, current online judge systems just list all
solution programs submitted by users for references, and the programs are
sorted based on the submission date and time, execution time, or user rating,
ignoring to what extent the programs can be helpful to be referenced. In
addition, users struggle to refer to a variety of solution approaches since
there are too many duplicated and near-duplicated programs. To motivate
learners to refer to various solutions to learn better solution approaches, in
this paper, we propose an approach to deduplicate and rank common solution
programs in each programming problem. Inspired by the nature that the
many-duplicated program adopts a more common approach and can be a general
reference, we remove the near-duplicated solution programs and rank the unique
programs based on the duplicate count. The experiments on the solution programs
submitted to a real-world online judge system demonstrate that the number of
programs is reduced by 60.20%, whereas the baseline only reduces by 29.59%
after the deduplication, meaning that users only need to refer to 39.80% of
programs on average. Furthermore, our analysis shows that top-10 ranked
programs cover 29.95% of programs on average, indicating that users can grasp
29.95% of solution approaches by referring to only 10 programs. The proposed
approach shows the potential of reducing the learners' burden of referring to
too many solutions and motivating them to learn a variety of solution
approaches.Comment: 7 pages, 5 figures, accepted to ASSE 202
Composing control flow and formula rules for computing on grids
We define computation on grids as the composition, through pushout constructions, of control flows, carried across adjacency relations between grid cells, with formulas updating the value of some attribute. The approach is based on the identification of a subcategory of attributed typed graphs suitable to the definition of pushouts on grids, and is illustrated in the context of the Cyberfilm visual language
Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey
The automated code evaluation system (AES) is mainly designed to reliably
assess user-submitted code. Due to their extensive range of applications and
the accumulation of valuable resources, AESs are becoming increasingly popular.
Research on the application of AES and their real-world resource exploration
for diverse coding tasks is still lacking. In this study, we conducted a
comprehensive survey on AESs and their resources. This survey explores the
application areas of AESs, available resources, and resource utilization for
coding tasks. AESs are categorized into programming contests, programming
learning and education, recruitment, online compilers, and additional modules,
depending on their application. We explore the available datasets and other
resources of these systems for research, analysis, and coding tasks. Moreover,
we provide an overview of machine learning-driven coding tasks, such as bug
detection, code review, comprehension, refactoring, search, representation, and
repair. These tasks are performed using real-life datasets. In addition, we
briefly discuss the Aizu Online Judge platform as a real example of an AES from
the perspectives of system design (hardware and software), operation
(competition and education), and research. This is due to the scalability of
the AOJ platform (programming education, competitions, and practice), open
internal features (hardware and software), attention from the research
community, open source data (e.g., solution codes and submission documents),
and transparency. We also analyze the overall performance of this system and
the perceived challenges over the years
Refactoring Programs Using Large Language Models with Few-Shot Examples
A less complex and more straightforward program is a crucial factor that
enhances its maintainability and makes writing secure and bug-free programs
easier. However, due to its heavy workload and the risks of breaking the
working programs, programmers are reluctant to do code refactoring, and thus,
it also causes the loss of potential learning experiences. To mitigate this, we
demonstrate the application of using a large language model (LLM), GPT-3.5, to
suggest less complex versions of the user-written Python program, aiming to
encourage users to learn how to write better programs. We propose a method to
leverage the prompting with few-shot examples of the LLM by selecting the
best-suited code refactoring examples for each target programming problem based
on the prior evaluation of prompting with the one-shot example. The
quantitative evaluation shows that 95.68% of programs can be refactored by
generating 10 candidates each, resulting in a 17.35% reduction in the average
cyclomatic complexity and a 25.84% decrease in the average number of lines
after filtering only generated programs that are semantically correct.
Furthermore, the qualitative evaluation shows outstanding capability in code
formatting, while unnecessary behaviors such as deleting or translating
comments are also observed.Comment: 10 pages, 10 figures, accepted to the 30th Asia-Pacific Software
Engineering Conference (APSEC 2023
Rule-Based Error Classification for Analyzing Differences in Frequent Errors
Finding and fixing errors is a time-consuming task not only for novice
programmers but also for expert programmers. Prior work has identified frequent
error patterns among various levels of programmers. However, the differences in
the tendencies between novices and experts have yet to be revealed. From the
knowledge of the frequent errors in each level of programmers, instructors will
be able to provide helpful advice for each level of learners. In this paper, we
propose a rule-based error classification tool to classify errors in code pairs
consisting of wrong and correct programs. We classify errors for 95,631 code
pairs and identify 3.47 errors on average, which are submitted by various
levels of programmers on an online judge system. The classified errors are used
to analyze the differences in frequent errors between novice and expert
programmers. The analyzed results show that, as for the same introductory
problems, errors made by novices are due to the lack of knowledge in
programming, and the mistakes are considered an essential part of the learning
process. On the other hand, errors made by experts are due to misunderstandings
caused by the carelessness of reading problems or the challenges of solving
problems differently than usual. The proposed tool can be used to create
error-labeled datasets and for further code-related educational research.Comment: 7 pages, 4 figures, accepted to TALE 202
Program Repair with Minimal Edits Using CodeT5
Programmers often struggle to identify and fix bugs in their programs. In
recent years, many language models (LMs) have been proposed to fix erroneous
programs and support error recovery. However, the LMs tend to generate
solutions that differ from the original input programs. This leads to potential
comprehension difficulties for users. In this paper, we propose an approach to
suggest a correct program with minimal repair edits using CodeT5. We fine-tune
a pre-trained CodeT5 on code pairs of wrong and correct programs and evaluate
its performance with several baseline models. The experimental results show
that the fine-tuned CodeT5 achieves a pass@100 of 91.95% and an average edit
distance of the most similar correct program of 6.84, which indicates that at
least one correct program can be suggested by generating 100 candidate
programs. We demonstrate the effectiveness of LMs in suggesting program repair
with minimal edits for solving introductory programming problems.Comment: 7 pages, 6 figures, accepted to iCAST 202
ChatGPT for Education and Research: Opportunities, Threats, and Strategies
In recent years, the rise of advanced artificial intelligence technologies has had a profound impact on many fields, including education and research. One such technology is ChatGPT, a powerful large language model developed by OpenAI. This technology offers exciting opportunities for students and educators, including personalized feedback, increased accessibility, interactive conversations, lesson preparation, evaluation, and new ways to teach complex concepts. However, ChatGPT poses different threats to the traditional education and research system, including the possibility of cheating on online exams, human-like text generation, diminished critical thinking skills, and difficulties in evaluating information generated by ChatGPT. This study explores the potential opportunities and threats that ChatGPT poses to overall education from the perspective of students and educators. Furthermore, for programming learning, we explore how ChatGPT helps students improve their programming skills. To demonstrate this, we conducted different coding-related experiments with ChatGPT, including code generation from problem descriptions, pseudocode generation of algorithms from texts, and code correction. The generated codes are validated with an online judge system to evaluate their accuracy. In addition, we conducted several surveys with students and teachers to find out how ChatGPT supports programming learning and teaching. Finally, we present the survey results and analysis
Efficient visualisation of the relative distribution of keyword search results in a corpus data cube
Most keyword searches target precision for finding the most relevant document. However some target recall, finding all relevant documents. Our system supports high recall searches that return hundreds or thousands of relevant results. In particular, it provides a visualization that shows the distribution of search results relative to the distribution of items for the entire corpus. Such relative distributional features include over and under representation, clusters and outliers. The contribution of this paper is efficient visualisation, that is, how to provide the best relative distribution view for a given data cube size. This requirement is translated to: for which limited size meta-data summary cube are search results disambiguated the most in our relative distribution view. We identify metrics and several algorithms for such a summary cube selection