85 research outputs found

    A novel approach for Software Clone detection using Data Mining in Software

    Get PDF
    The Similar Program structures which recur in variant forms in software systems are code clones. Many techniques are proposed in order to detect similar code fragments in software. The software maintenance is generally helped by maintenance is generally helped by the identification and subsequent unification. When the patterns of simple clones reoccur, it is an indication for the presence of interesting higher-level similarities. They are called as Structural Clones. The structural clones when compared to simple clones show a bigger picture of similarities. The problem of huge number of clones is alleviated by the structural clones, which are part of logical groups of simple clones. In order to understand the design of the system for better maintenance and reengineering for reuse, detection of structural clones is essential. In this paper, a technique which is useful to detect some useful types of structural clones is proposed. The novelty of the present approach comprises the formulation of the structural clone concept and the application of data mining techniques. A novel approach is useful for implementation of the proposed technique is described

    SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering

    Full text link
    Version information plays an important role in spreadsheet understanding, maintaining and quality improving. However, end users rarely use version control tools to document spreadsheet version information. Thus, the spreadsheet version information is missing, and different versions of a spreadsheet coexist as individual and similar spreadsheets. Existing approaches try to recover spreadsheet version information through clustering these similar spreadsheets based on spreadsheet filenames or related email conversation. However, the applicability and accuracy of existing clustering approaches are limited due to the necessary information (e.g., filenames and email conversation) is usually missing. We inspected the versioned spreadsheets in VEnron, which is extracted from the Enron Corporation. In VEnron, the different versions of a spreadsheet are clustered into an evolution group. We observed that the versioned spreadsheets in each evolution group exhibit certain common features (e.g., similar table headers and worksheet names). Based on this observation, we proposed an automatic clustering algorithm, SpreadCluster. SpreadCluster learns the criteria of features from the versioned spreadsheets in VEnron, and then automatically clusters spreadsheets with the similar features into the same evolution group. We applied SpreadCluster on all spreadsheets in the Enron corpus. The evaluation result shows that SpreadCluster could cluster spreadsheets with higher precision and recall rate than the filename-based approach used by VEnron. Based on the clustering result by SpreadCluster, we further created a new versioned spreadsheet corpus VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the other two spreadsheet corpora FUSE and EUSES. The results show that SpreadCluster can cluster the versioned spreadsheets in these two corpora with high precision.Comment: 12 pages, MSR 201

    Enhancing code clone detection using control flow graphs

    Get PDF
    Code clones are syntactically or semantically equivalent code fragments of source code. Copy-and-paste programming allows software developers to improve development productivity, but it could produce code clones that can introduce non-trivial difficulties in software maintenance. In this paper, a code clone detection framework is presented with a feature extractor and a clone classifier using deep learning. The clone classifier is trained with true and false clones and then is tested with a test dataset to evaluate the performance of the proposed approach to clone detection. In particular, the proposed approach to clone detection uses Control Flow Graphs (CFGs) to extract features of a given code snippet. The selected features are used to compute similarity scores for comparing two code fragments. The clone classifier is trained and tested with similarity scores that quantify the degree of how similar two code fragments are. The experimental results demonstrate that using CFG features is a viable methodology in terms of the effectiveness of clone detection for both syntactic and semantic clones

    Nominal cross recurrence as a generalized lag sequential analysis for behavioral streams

    Get PDF
    We briefly present lag sequential analysis for behavioral streams, a commonly used method in psychology for quantifying the relationships between two nominal time series. Cross recurrence quantification analysis (CRQA) is shown as an extension of this technique, and we exemplify this nominal application of CRQA to eye-movement data in human interaction. In addition, we demonstrate nominal CRQA in a simple coupled logistic map simulation used in previous communication research, permitting the investigation of properties of nonlinear systems such as bifurcation and onset to chaos, even in the streams obtained by coarse-graining a coupled nonlinear model. We end with a summary of the importance of CRQA for exploring the relationship between two behavioral streams, and review a recent theoretical trend in the cognitive sciences that would be usefully informed by this and similar nonlinear methods. We hope this work will encourage scientists interested in general properties of complex, nonlinear dynamical systems to apply emerging methods to coarse-grained, nominal units of measure, as there is an immediate need for their application in the psychological domain
    corecore