281 research outputs found

    Robust Parsing of Cloned Token Sequences

    Get PDF
    Token-based clone detection techniques are known for theirscalability, high recall, and robustness against syntax errors andincomplete code. They, however, may yield clones that aresyntactically incomplete and they know very little about the syntacticstructure of their reported clones. Hence, their results cannotimmediately be used for automated refactorings or syntactic filtersfor relevance.This paper explores techniques of robust parsing to parse codefragments reported by token-based clone detectors to determine whetherthe clones are syntactically complete and what kind of syntacticelements they contain.This knowledge can be used to improve the precision of token-basedclone detection

    An Extended Stable Marriage Problem Algorithm for Clone Detection

    Full text link
    Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

    Detecting sequential structure

    Get PDF
    Programming by demonstration requires detection and analysis of sequential patterns in a user’s input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure., Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided

    Program Similarity Detection with Checksims

    Get PDF
    In response to growing academic dishonesty in low- level computer science and electrical and computer engineering courses, we present extit{checksims}, a similarity detector designed to highlight suspicious assignments for instructor review. We report the design rationale for the software, and describe our detection of dozens of previously undetected cases of academic dishonesty in previous classes

    A Hybrid Graph Neural Network Approach for Detecting PHP Vulnerabilities

    Full text link
    This paper presents DeepTective, a deep learning approach to detect vulnerabilities in PHP source code. Our approach implements a novel hybrid technique that combines Gated Recurrent Units and Graph Convolutional Networks to detect SQLi, XSS and OSCI vulnerabilities leveraging both syntactic and semantic information. We evaluate DeepTective and compare it to the state of the art on an established synthetic dataset and on a novel real-world dataset collected from GitHub. Experimental results show that DeepTective achieves near perfect classification on the synthetic dataset, and an F1 score of 88.12% on the realistic dataset, outperforming related approaches. We validate DeepTective in the wild by discovering 4 novel vulnerabilities in established WordPress plugins.Comment: A poster version of this paper appeared as https://doi.org/10.1145/3412841.344213
    • 

    corecore