Search CORE

1,301 research outputs found

SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

Author: Dou Wensheng
Gao Chushu
Huang Tao
Wang Jie
Wei Jun
Xu Liang
Zhong Hua
Publication venue
Publication date: 27/04/2017
Field of study

Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

arXiv.org e-Print Archive

Crossref

Automated Refactoring of Nested-IF Formulae in Spreadsheets

Author: Han Shi
Hao Dan
Zhang Dongmei
Zhang Jie
Zhang Lu
Publication venue
Publication date: 28/12/2017
Field of study

Spreadsheets are the most popular end-user programming software, where formulae act like programs and also have smells. One well recognized common smell of spreadsheet formulae is nest-IF expressions, which have low readability and high cognitive cost for users, and are error-prone during reuse or maintenance. However, end users usually lack essential programming language knowledge and skills to tackle or even realize the problem. The previous research work has made very initial attempts in this aspect, while no effective and automated approach is currently available. This paper firstly proposes an AST-based automated approach to systematically refactoring nest-IF formulae. The general idea is two-fold. First, we detect and remove logic redundancy on the AST. Second, we identify higher-level semantics that have been fragmented and scattered, and reassemble the syntax using concise built-in functions. A comprehensive evaluation has been conducted against a real-world spreadsheet corpus, which is collected in a leading IT company for research purpose. The results with over 68,000 spreadsheets with 27 million nest-IF formulae reveal that our approach is able to relieve the smell of over 99\% of nest-IF formulae. Over 50% of the refactorings have reduced nesting levels of the nest-IFs by more than a half. In addition, a survey involving 49 participants indicates that for most cases the participants prefer the refactored formulae, and agree on that such automated refactoring approach is necessary and helpful

arXiv.org e-Print Archive

Crossref

Business Rule Mining from Spreadsheets

Author: Roy Sohon
Publication venue
Publication date: 19/03/2015
Field of study

Business rules represent the knowledge that guides the operations of a business organization. They are implemented in software applications used by organizations, and the activity of extracting them from software is known as business rule mining. It has various purposes amongst which migration and generating documentation are the most common. However, apart from conventional software, organizations also use spreadsheets for a large part of their operations and decision-making activities. Therefore we believe that spreadsheets are also rich in business rules. We thus propose to develop an automated system for extracting business rules from spreadsheets in a human comprehensible natural language format. This position paper describes our motivation, the problem description, related work, and challenges we foresee.Comment: In Proceedings of the 2nd Workshop on Software Engineering Methods in Spreadsheets (http://spreadsheetlab.org/sems15/

arXiv.org e-Print Archive

TU Delft Repository

Copy-paste Tracking: Fixing Spreadsheets Without Breaking Them

Author: Hermans F. (Felienne)
Storm T. (Tijs) van der
Publication venue
Publication date: 01/01/2015
Field of study

Spreadsheets are the most popular live programming environments, but they are also notoriously fault-prone. One reason for this is that users actively rely on copy-paste to make up for the lack of abstraction mechanisms. Adding abstraction however, introduces indirection and thus cognitive distance. In this paper we propose an alternative: copy-paste tracking. Tracking copies that spreadsheet users make, allows them to directly edit copy-pasted formulas, but instead of changing only a single instance, the changes will be propagated to all formulas copied from the same source. As a result, spreadsheet users will enjoy the benefits of abstraction without its drawbacks

CWI's Institutional Repository

A Literature Review of Spreadsheet Technology

Author: Bock Alexander
Publication venue
Publication date: 01/11/2016
Field of study

The IT University of Copenhagen's Repository

Copy-paste tracking: Fixing spreadsheets without breaking them

Author: Hermans F.F.J.
Storm T. van der
Publication venue
Publication date: 13/07/2015
Field of study

Leiden University Scholary Publications