Search CORE

3 research outputs found

SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

Author: Dou Wensheng
Gao Chushu
Huang Tao
Wang Jie
Wei Jun
Xu Liang
Zhong Hua
Publication venue
Publication date: 27/04/2017
Field of study

Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

arXiv.org e-Print Archive

Crossref

Improving Feedback from Automated Reviews of Student Spreadsheets

Author: Kammer Frank
Kuche Jonas-Ian
Reid Sören Aguirre
Ritzke Pia-Doreen
Siepermann Markus
Stephan Max
Wagenknecht Armin
Publication venue
Publication date: 14/10/2023
Field of study

Spreadsheets are one of the most widely used tools for end users. As a result, spreadsheets such as Excel are now included in many curricula. However, digital solutions for assessing spreadsheet assignments are still scarce in the teaching context. Therefore, we have developed an Intelligent Tutoring System (ITS) to review students' Excel submissions and provide individualized feedback automatically. Although the lecturer only needs to provide one reference solution, the students' submissions are analyzed automatically in several ways: value matching, detailed analysis of the formulas, and quality assessment of the solution. To take the students' learning level into account, we have developed feedback levels for an ITS that provide gradually more information about the error by using one of the different analyses. Feedback at a higher level has been shown to lead to a higher percentage of correct submissions and was also perceived as well understandable and helpful by the students

arXiv.org e-Print Archive