18,540 research outputs found
SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering
Version information plays an important role in spreadsheet understanding,
maintaining and quality improving. However, end users rarely use version
control tools to document spreadsheet version information. Thus, the
spreadsheet version information is missing, and different versions of a
spreadsheet coexist as individual and similar spreadsheets. Existing approaches
try to recover spreadsheet version information through clustering these similar
spreadsheets based on spreadsheet filenames or related email conversation.
However, the applicability and accuracy of existing clustering approaches are
limited due to the necessary information (e.g., filenames and email
conversation) is usually missing. We inspected the versioned spreadsheets in
VEnron, which is extracted from the Enron Corporation. In VEnron, the different
versions of a spreadsheet are clustered into an evolution group. We observed
that the versioned spreadsheets in each evolution group exhibit certain common
features (e.g., similar table headers and worksheet names). Based on this
observation, we proposed an automatic clustering algorithm, SpreadCluster.
SpreadCluster learns the criteria of features from the versioned spreadsheets
in VEnron, and then automatically clusters spreadsheets with the similar
features into the same evolution group. We applied SpreadCluster on all
spreadsheets in the Enron corpus. The evaluation result shows that
SpreadCluster could cluster spreadsheets with higher precision and recall rate
than the filename-based approach used by VEnron. Based on the clustering result
by SpreadCluster, we further created a new versioned spreadsheet corpus
VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the
other two spreadsheet corpora FUSE and EUSES. The results show that
SpreadCluster can cluster the versioned spreadsheets in these two corpora with
high precision.Comment: 12 pages, MSR 201
WFIRST Coronagraph Technology Requirements: Status Update and Systems Engineering Approach
The coronagraphic instrument (CGI) on the Wide-Field Infrared Survey
Telescope (WFIRST) will demonstrate technologies and methods for high-contrast
direct imaging and spectroscopy of exoplanet systems in reflected light,
including polarimetry of circumstellar disks. The WFIRST management and CGI
engineering and science investigation teams have developed requirements for the
instrument, motivated by the objectives and technology development needs of
potential future flagship exoplanet characterization missions such as the NASA
Habitable Exoplanet Imaging Mission (HabEx) and the Large UV/Optical/IR
Surveyor (LUVOIR). The requirements have been refined to support
recommendations from the WFIRST Independent External Technical/Management/Cost
Review (WIETR) that the WFIRST CGI be classified as a technology demonstration
instrument instead of a science instrument. This paper provides a description
of how the CGI requirements flow from the top of the overall WFIRST mission
structure through the Level 2 requirements, where the focus here is on
capturing the detailed context and rationales for the CGI Level 2 requirements.
The WFIRST requirements flow starts with the top Program Level Requirements
Appendix (PLRA), which contains both high-level mission objectives as well as
the CGI-specific baseline technical and data requirements (BTR and BDR,
respectively)... We also present the process and collaborative tools used in
the L2 requirements development and management, including the collection and
organization of science inputs, an open-source approach to managing the
requirements database, and automating documentation. The tools created for the
CGI L2 requirements have the potential to improve the design and planning of
other projects, streamlining requirement management and maintenance. [Abstract
Abbreviated]Comment: 16 pages, 4 figure
Spreadsheet Error Correction Using an Activity Framework and a Cognitive Fit Perspective
Errors in a spreadsheet constitute a serious reason for concern among organizations as well as academics. There are ongoing efforts toward finding ways to reduce errors, designing and developing visualization tools to support error correction activities being one of them. In this paper, we propose a framework for classifying activities associated with spreadsheet error correction. The purpose of this framework is to help in understanding the activities that are important for correcting different types of spreadsheet errors and how different visualization tools can help in error correction by effectively supporting these activities. An experiment is designed to test the effectiveness of a visualization tool that supports one of the most important activities from the framework – chaining activity. Two groups of subjects, with and without the visualization tool, are required to correct two types of errors. Our hypotheses are derived based on the notion of cognitive fit between problem representation and task, and the results of the experiment support most of the hypotheses. Thus, this study demonstrates the usefulness of the activity-based framework for spreadsheet error correction, and also provides guidelines for designing and developing tools for spreadsheet audit. It also provides empirical evidence to the cognitive fit theory by showing that performance is significantly better when visual support tools result in a match between problem representation and the task in hand, as in the case of correcting link errors with the tool used in this study. Theoretical and practical implications of the findings are discussed
- …