63 research outputs found
SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering
Version information plays an important role in spreadsheet understanding,
maintaining and quality improving. However, end users rarely use version
control tools to document spreadsheet version information. Thus, the
spreadsheet version information is missing, and different versions of a
spreadsheet coexist as individual and similar spreadsheets. Existing approaches
try to recover spreadsheet version information through clustering these similar
spreadsheets based on spreadsheet filenames or related email conversation.
However, the applicability and accuracy of existing clustering approaches are
limited due to the necessary information (e.g., filenames and email
conversation) is usually missing. We inspected the versioned spreadsheets in
VEnron, which is extracted from the Enron Corporation. In VEnron, the different
versions of a spreadsheet are clustered into an evolution group. We observed
that the versioned spreadsheets in each evolution group exhibit certain common
features (e.g., similar table headers and worksheet names). Based on this
observation, we proposed an automatic clustering algorithm, SpreadCluster.
SpreadCluster learns the criteria of features from the versioned spreadsheets
in VEnron, and then automatically clusters spreadsheets with the similar
features into the same evolution group. We applied SpreadCluster on all
spreadsheets in the Enron corpus. The evaluation result shows that
SpreadCluster could cluster spreadsheets with higher precision and recall rate
than the filename-based approach used by VEnron. Based on the clustering result
by SpreadCluster, we further created a new versioned spreadsheet corpus
VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the
other two spreadsheet corpora FUSE and EUSES. The results show that
SpreadCluster can cluster the versioned spreadsheets in these two corpora with
high precision.Comment: 12 pages, MSR 201
Automated Refactoring of Nested-IF Formulae in Spreadsheets
Spreadsheets are the most popular end-user programming software, where
formulae act like programs and also have smells. One well recognized common
smell of spreadsheet formulae is nest-IF expressions, which have low
readability and high cognitive cost for users, and are error-prone during reuse
or maintenance. However, end users usually lack essential programming language
knowledge and skills to tackle or even realize the problem. The previous
research work has made very initial attempts in this aspect, while no effective
and automated approach is currently available.
This paper firstly proposes an AST-based automated approach to systematically
refactoring nest-IF formulae. The general idea is two-fold. First, we detect
and remove logic redundancy on the AST. Second, we identify higher-level
semantics that have been fragmented and scattered, and reassemble the syntax
using concise built-in functions. A comprehensive evaluation has been conducted
against a real-world spreadsheet corpus, which is collected in a leading IT
company for research purpose. The results with over 68,000 spreadsheets with 27
million nest-IF formulae reveal that our approach is able to relieve the smell
of over 99\% of nest-IF formulae. Over 50% of the refactorings have reduced
nesting levels of the nest-IFs by more than a half. In addition, a survey
involving 49 participants indicates that for most cases the participants prefer
the refactored formulae, and agree on that such automated refactoring approach
is necessary and helpful
Smelling faults in spreadsheets
Despite being staggeringly error prone, spreadsheets are a highly flexible programming environment that is widely used in industry. In fact, spreadsheets are widely adopted for decision making, and decisions taken upon wrong (spreadsheet-based) assumptions may have serious economical impacts on businesses, among other consequences. This paper proposes a technique to automatically pinpoint potential faults in spreadsheets. It combines a catalog of spreadsheet smells that provide a first indication of a potential fault, with a generic spectrum-based fault localization strategy in order to improve (in terms of accuracy and false positive rate) on these initial results. Our technique has been implemented in a tool which helps users detecting faults.To validate the proposed technique, we consider a wellknown and well-documented catalog of faulty spreadsheets. Our experiments yield two main results: we were able to distinguish between smells that can point to faulty cells from smells and those that are not capable of doing so; and we provide a technique capable of detecting a significant number of errors: two thirds of the cells labeled as faulty are in fact (documented) errors
30 Years of Software Refactoring Research: A Systematic Literature Review
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155872/4/30YRefactoring.pd
30 Years of Software Refactoring Research:A Systematic Literature Review
Due to the growing complexity of software systems, there has been a dramatic
increase and industry demand for tools and techniques on software refactoring
in the last ten years, defined traditionally as a set of program
transformations intended to improve the system design while preserving the
behavior. Refactoring studies are expanded beyond code-level restructuring to
be applied at different levels (architecture, model, requirements, etc.),
adopted in many domains beyond the object-oriented paradigm (cloud computing,
mobile, web, etc.), used in industrial settings and considered objectives
beyond improving the design to include other non-functional requirements (e.g.,
improve performance, security, etc.). Thus, challenges to be addressed by
refactoring work are, nowadays, beyond code transformation to include, but not
limited to, scheduling the opportune time to carry refactoring, recommendations
of specific refactoring activities, detection of refactoring opportunities, and
testing the correctness of applied refactorings. Therefore, the refactoring
research efforts are fragmented over several research communities, various
domains, and objectives. To structure the field and existing research results,
this paper provides a systematic literature review and analyzes the results of
3183 research papers on refactoring covering the last three decades to offer
the most scalable and comprehensive literature review of existing refactoring
research studies. Based on this survey, we created a taxonomy to classify the
existing research, identified research trends, and highlighted gaps in the
literature and avenues for further research.Comment: 23 page
- …