1,146 research outputs found

    SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

    Full text link
    Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects

    Full text link
    Exaggeration or context changes can render maintainability experience into prejudice. For example, JavaScript is often seen as least elegant language and hence of lowest maintainability. Such prejudice should not guide decisions without prior empirical validation. We formulated 10 hypotheses about maintainability based on prejudices and test them in a large set of open-source projects (6,897 GitHub repositories, 402 million lines, 5 programming languages). We operationalize maintainability with five static analysis metrics. We found that JavaScript code is not worse than other code, Java code shows higher maintainability than C# code and C code has longer methods than other code. The quality of interface documentation is better in Java code than in other code. Code developed by teams is not of higher and large code bases not of lower maintainability. Projects with high maintainability are not more popular or more often forked. Overall, most hypotheses are not supported by open-source data.Comment: 20 page

    JSClassFinder: A Tool to Detect Class-like Structures in JavaScript

    Get PDF
    With the increasing usage of JavaScript in web applications, there is a great demand to write JavaScript code that is reliable and maintainable. To achieve these goals, classes can be emulated in the current JavaScript standard version. In this paper, we propose a reengineering tool to identify such class-like structures and to create an object-oriented model based on JavaScript source code. The tool has a parser that loads the AST (Abstract Syntax Tree) of a JavaScript application to model its structure. It is also integrated with the Moose platform to provide powerful visualization, e.g., UML diagram and Distribution Maps, and well-known metric values for software analysis. We also provide some examples with real JavaScript applications to evaluate the tool.Comment: VI Brazilian Conference on Software: Theory and Practice (Tools Track), p. 1-8, 201

    Code Smell Detection Techniques and Process: A Review

    Get PDF
    A code smell is a hint that something has turned out badly some place in your code. The idea of code smells was introduced to characterize various different types of design shortcomings in code. Code and design smells are poor solutions to recurring implementation and design problems. They may hinder the evolution of a system by making it hard for software engineers to carry out changes. In this paper, we reviewed code smell detection tool like: D�cor, InFusion, JDeodorant, PMD, Stench Blossom, etc. Furthermore, we discussed various code smells detecting techniques. Code clones are indistinguishable fragment of source code which may be embedded deliberately or inadvertently. Reusing code pieces through reordering with or without minor adjustments is general undertaking in programming advancement. We�ve examined several papers to explore various tools and techniques used for code smell. In addition, we reviewed the process of code smell detection
    corecore