3,043 research outputs found

    Organizing the Technical Debt Landscape

    Get PDF
    To date, several methods and tools for detecting source code and design anomalies have been developed. While each method focuses on identifying certain classes of source code anomalies that potentially relate to technical debt (TD), the overlaps and gaps among these classes and TD have not been rigorously demonstrated. We propose to construct a seminal technical debt landscape as a way to visualize and organize research on the subjec

    Combining Spreadsheet Smells for Improved Fault Prediction

    Full text link
    Spreadsheets are commonly used in organizations as a programming tool for business-related calculations and decision making. Since faults in spreadsheets can have severe business impacts, a number of approaches from general software engineering have been applied to spreadsheets in recent years, among them the concept of code smells. Smells can in particular be used for the task of fault prediction. An analysis of existing spreadsheet smells, however, revealed that the predictive power of individual smells can be limited. In this work we therefore propose a machine learning based approach which combines the predictions of individual smells by using an AdaBoost ensemble classifier. Experiments on two public datasets containing real-world spreadsheet faults show significant improvements in terms of fault prediction accuracy.Comment: 4 pages, 1 figure, to be published in 40th International Conference on Software Engineering: New Ideas and Emerging Results Trac

    SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

    Full text link
    Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

    Enron versus EUSES: A Comparison of Two Spreadsheet Corpora

    Full text link
    Spreadsheets are widely used within companies and often form the basis for business decisions. Numerous cases are known where incorrect information in spreadsheets has lead to incorrect decisions. Such cases underline the relevance of research on the professional use of spreadsheets. Recently a new dataset became available for research, containing over 15.000 business spreadsheets that were extracted from the Enron E-mail Archive. With this dataset, we 1) aim to obtain a thorough understanding of the characteristics of spreadsheets used within companies, and 2) compare the characteristics of the Enron spreadsheets with the EUSES corpus which is the existing state of the art set of spreadsheets that is frequently used in spreadsheet studies. Our analysis shows that 1) the majority of spreadsheets are not large in terms of worksheets and formulas, do not have a high degree of coupling, and their formulas are relatively simple; 2) the spreadsheets from the EUSES corpus are, with respect to the measured characteristics, quite similar to the Enron spreadsheets.Comment: In Proceedings of the 2nd Workshop on Software Engineering Methods in Spreadsheet

    An analysis of techniques and methods for technical debt management: a reflection from the architecture perspective

    Full text link
    Technical debt is a metaphor referring to the consequences of weak software development. Managing technical debt is necessary in order to keep it under control, and several techniques have been developed with the goal of accomplishing this. However, available techniques have grown disperse and managers lack guidance. This paper covers this gap by providing a systematic mapping of available techniques and methods for technical debt management, covering architectural debt, and identifying existing gaps that prevent to manage technical debt efficiently
    • …
    corecore