18,540 research outputs found

    SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

    Full text link
    Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

    WFIRST Coronagraph Technology Requirements: Status Update and Systems Engineering Approach

    Full text link
    The coronagraphic instrument (CGI) on the Wide-Field Infrared Survey Telescope (WFIRST) will demonstrate technologies and methods for high-contrast direct imaging and spectroscopy of exoplanet systems in reflected light, including polarimetry of circumstellar disks. The WFIRST management and CGI engineering and science investigation teams have developed requirements for the instrument, motivated by the objectives and technology development needs of potential future flagship exoplanet characterization missions such as the NASA Habitable Exoplanet Imaging Mission (HabEx) and the Large UV/Optical/IR Surveyor (LUVOIR). The requirements have been refined to support recommendations from the WFIRST Independent External Technical/Management/Cost Review (WIETR) that the WFIRST CGI be classified as a technology demonstration instrument instead of a science instrument. This paper provides a description of how the CGI requirements flow from the top of the overall WFIRST mission structure through the Level 2 requirements, where the focus here is on capturing the detailed context and rationales for the CGI Level 2 requirements. The WFIRST requirements flow starts with the top Program Level Requirements Appendix (PLRA), which contains both high-level mission objectives as well as the CGI-specific baseline technical and data requirements (BTR and BDR, respectively)... We also present the process and collaborative tools used in the L2 requirements development and management, including the collection and organization of science inputs, an open-source approach to managing the requirements database, and automating documentation. The tools created for the CGI L2 requirements have the potential to improve the design and planning of other projects, streamlining requirement management and maintenance. [Abstract Abbreviated]Comment: 16 pages, 4 figure

    Spreadsheet Error Correction Using an Activity Framework and a Cognitive Fit Perspective

    Get PDF
    Errors in a spreadsheet constitute a serious reason for concern among organizations as well as academics. There are ongoing efforts toward finding ways to reduce errors, designing and developing visualization tools to support error correction activities being one of them. In this paper, we propose a framework for classifying activities associated with spreadsheet error correction. The purpose of this framework is to help in understanding the activities that are important for correcting different types of spreadsheet errors and how different visualization tools can help in error correction by effectively supporting these activities. An experiment is designed to test the effectiveness of a visualization tool that supports one of the most important activities from the framework – chaining activity. Two groups of subjects, with and without the visualization tool, are required to correct two types of errors. Our hypotheses are derived based on the notion of cognitive fit between problem representation and task, and the results of the experiment support most of the hypotheses. Thus, this study demonstrates the usefulness of the activity-based framework for spreadsheet error correction, and also provides guidelines for designing and developing tools for spreadsheet audit. It also provides empirical evidence to the cognitive fit theory by showing that performance is significantly better when visual support tools result in a match between problem representation and the task in hand, as in the case of correcting link errors with the tool used in this study. Theoretical and practical implications of the findings are discussed
    • …
    corecore