1,146 research outputs found
SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering
Version information plays an important role in spreadsheet understanding,
maintaining and quality improving. However, end users rarely use version
control tools to document spreadsheet version information. Thus, the
spreadsheet version information is missing, and different versions of a
spreadsheet coexist as individual and similar spreadsheets. Existing approaches
try to recover spreadsheet version information through clustering these similar
spreadsheets based on spreadsheet filenames or related email conversation.
However, the applicability and accuracy of existing clustering approaches are
limited due to the necessary information (e.g., filenames and email
conversation) is usually missing. We inspected the versioned spreadsheets in
VEnron, which is extracted from the Enron Corporation. In VEnron, the different
versions of a spreadsheet are clustered into an evolution group. We observed
that the versioned spreadsheets in each evolution group exhibit certain common
features (e.g., similar table headers and worksheet names). Based on this
observation, we proposed an automatic clustering algorithm, SpreadCluster.
SpreadCluster learns the criteria of features from the versioned spreadsheets
in VEnron, and then automatically clusters spreadsheets with the similar
features into the same evolution group. We applied SpreadCluster on all
spreadsheets in the Enron corpus. The evaluation result shows that
SpreadCluster could cluster spreadsheets with higher precision and recall rate
than the filename-based approach used by VEnron. Based on the clustering result
by SpreadCluster, we further created a new versioned spreadsheet corpus
VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the
other two spreadsheet corpora FUSE and EUSES. The results show that
SpreadCluster can cluster the versioned spreadsheets in these two corpora with
high precision.Comment: 12 pages, MSR 201
Structured Review of the Evidence for Effects of Code Duplication on Software Quality
This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)
Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
Exaggeration or context changes can render maintainability experience into
prejudice. For example, JavaScript is often seen as least elegant language and
hence of lowest maintainability. Such prejudice should not guide decisions
without prior empirical validation. We formulated 10 hypotheses about
maintainability based on prejudices and test them in a large set of open-source
projects (6,897 GitHub repositories, 402 million lines, 5 programming
languages). We operationalize maintainability with five static analysis
metrics. We found that JavaScript code is not worse than other code, Java code
shows higher maintainability than C# code and C code has longer methods than
other code. The quality of interface documentation is better in Java code than
in other code. Code developed by teams is not of higher and large code bases
not of lower maintainability. Projects with high maintainability are not more
popular or more often forked. Overall, most hypotheses are not supported by
open-source data.Comment: 20 page
JSClassFinder: A Tool to Detect Class-like Structures in JavaScript
With the increasing usage of JavaScript in web applications, there is a great
demand to write JavaScript code that is reliable and maintainable. To achieve
these goals, classes can be emulated in the current JavaScript standard
version. In this paper, we propose a reengineering tool to identify such
class-like structures and to create an object-oriented model based on
JavaScript source code. The tool has a parser that loads the AST (Abstract
Syntax Tree) of a JavaScript application to model its structure. It is also
integrated with the Moose platform to provide powerful visualization, e.g., UML
diagram and Distribution Maps, and well-known metric values for software
analysis. We also provide some examples with real JavaScript applications to
evaluate the tool.Comment: VI Brazilian Conference on Software: Theory and Practice (Tools
Track), p. 1-8, 201
Code Smell Detection Techniques and Process: A Review
A code smell is a hint that something has turned out badly some place in your code. The idea of code smells was introduced to characterize various different types of design shortcomings in code. Code and design smells are poor solutions to recurring implementation and design problems. They may hinder the evolution of a system by making it hard for software engineers to carry out changes. In this paper, we reviewed code smell detection tool like: D�cor, InFusion, JDeodorant, PMD, Stench Blossom, etc. Furthermore, we discussed various code smells detecting techniques. Code clones are indistinguishable fragment of source code which may be embedded deliberately or inadvertently. Reusing code pieces through reordering with or without minor adjustments is general undertaking in programming advancement. We�ve examined several papers to explore various tools and techniques used for code smell. In addition, we reviewed the process of code smell detection
- …