51 research outputs found
Structured Spreadsheet Modeling and Implementation
Developing an error-free spreadsheet has been a problem since the beginning
of end-user computing. In this paper, we present a methodology that separates
the modeling from the implementation. Using proven techniques from Information
Systems and Software Engineering, we present strict, but simple, rules
governing the implementation from the model. The resulting spreadsheet should
be easier to understand, audit and maintain.Comment: In Proceedings of the 2nd Workshop on Software Engineering Methods in
Spreadsheet
Enron versus EUSES: A Comparison of Two Spreadsheet Corpora
Spreadsheets are widely used within companies and often form the basis for
business decisions. Numerous cases are known where incorrect information in
spreadsheets has lead to incorrect decisions. Such cases underline the
relevance of research on the professional use of spreadsheets.
Recently a new dataset became available for research, containing over 15.000
business spreadsheets that were extracted from the Enron E-mail Archive. With
this dataset, we 1) aim to obtain a thorough understanding of the
characteristics of spreadsheets used within companies, and 2) compare the
characteristics of the Enron spreadsheets with the EUSES corpus which is the
existing state of the art set of spreadsheets that is frequently used in
spreadsheet studies.
Our analysis shows that 1) the majority of spreadsheets are not large in
terms of worksheets and formulas, do not have a high degree of coupling, and
their formulas are relatively simple; 2) the spreadsheets from the EUSES corpus
are, with respect to the measured characteristics, quite similar to the Enron
spreadsheets.Comment: In Proceedings of the 2nd Workshop on Software Engineering Methods in
Spreadsheet
SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering
Version information plays an important role in spreadsheet understanding,
maintaining and quality improving. However, end users rarely use version
control tools to document spreadsheet version information. Thus, the
spreadsheet version information is missing, and different versions of a
spreadsheet coexist as individual and similar spreadsheets. Existing approaches
try to recover spreadsheet version information through clustering these similar
spreadsheets based on spreadsheet filenames or related email conversation.
However, the applicability and accuracy of existing clustering approaches are
limited due to the necessary information (e.g., filenames and email
conversation) is usually missing. We inspected the versioned spreadsheets in
VEnron, which is extracted from the Enron Corporation. In VEnron, the different
versions of a spreadsheet are clustered into an evolution group. We observed
that the versioned spreadsheets in each evolution group exhibit certain common
features (e.g., similar table headers and worksheet names). Based on this
observation, we proposed an automatic clustering algorithm, SpreadCluster.
SpreadCluster learns the criteria of features from the versioned spreadsheets
in VEnron, and then automatically clusters spreadsheets with the similar
features into the same evolution group. We applied SpreadCluster on all
spreadsheets in the Enron corpus. The evaluation result shows that
SpreadCluster could cluster spreadsheets with higher precision and recall rate
than the filename-based approach used by VEnron. Based on the clustering result
by SpreadCluster, we further created a new versioned spreadsheet corpus
VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the
other two spreadsheet corpora FUSE and EUSES. The results show that
SpreadCluster can cluster the versioned spreadsheets in these two corpora with
high precision.Comment: 12 pages, MSR 201
Recommended from our members
Spreadsheet Explanation Through Table Abstraction
Spreadsheets are a pervasive technology throughout personal and industrial use. Often times, the user is not the author, contributing to a lack of understanding of the purpose and functionality of a spreadsheet. Furthermore, the lack of understanding is a major reason for mistakes in the use and maintenance of spreadsheets.
I present an approach, called explanation sheets, which eases the understanding and maintenance of spreadsheets. I identify the notion of explanation soundness and show that explanation sheets which conform to simple rules of formula convergence provide sound explanations. I also present a practical evaluation of explanation sheets based on samples drawn from widely used spreadsheet corpora and based on a small user study.
In addition to facilitating the understanding of spreadsheets, I describe the process of inferring explanation sheets from a spreadsheet. By means of assessing example spreadsheets, I present a set of inference rules to describe the relationship between a spreadsheet and its explanation
- …