281 research outputs found
Robust Parsing of Cloned Token Sequences
Token-based clone detection techniques are known for theirscalability, high recall, and robustness against syntax errors andincomplete code. They, however, may yield clones that aresyntactically incomplete and they know very little about the syntacticstructure of their reported clones. Hence, their results cannotimmediately be used for automated refactorings or syntactic filtersfor relevance.This paper explores techniques of robust parsing to parse codefragments reported by token-based clone detectors to determine whetherthe clones are syntactically complete and what kind of syntacticelements they contain.This knowledge can be used to improve the precision of token-basedclone detection
An Extended Stable Marriage Problem Algorithm for Clone Detection
Code cloning negatively affects industrial software and threatens
intellectual property. This paper presents a novel approach to detecting cloned
software by using a bijective matching technique. The proposed approach focuses
on increasing the range of similarity measures and thus enhancing the precision
of the detection. This is achieved by extending a well-known stable-marriage
problem (SMP) and demonstrating how matches between code fragments of different
files can be expressed. A prototype of the proposed approach is provided using
a proper scenario, which shows a noticeable improvement in several features of
clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
Detecting sequential structure
Programming by demonstration requires detection and analysis of sequential patterns in a userâs input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure., Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided
Program Similarity Detection with Checksims
In response to growing academic dishonesty in low- level computer science and electrical and computer engineering courses, we present extit{checksims}, a similarity detector designed to highlight suspicious assignments for instructor review. We report the design rationale for the software, and describe our detection of dozens of previously undetected cases of academic dishonesty in previous classes
A Hybrid Graph Neural Network Approach for Detecting PHP Vulnerabilities
This paper presents DeepTective, a deep learning approach to detect
vulnerabilities in PHP source code. Our approach implements a novel hybrid
technique that combines Gated Recurrent Units and Graph Convolutional Networks
to detect SQLi, XSS and OSCI vulnerabilities leveraging both syntactic and
semantic information. We evaluate DeepTective and compare it to the state of
the art on an established synthetic dataset and on a novel real-world dataset
collected from GitHub. Experimental results show that DeepTective achieves near
perfect classification on the synthetic dataset, and an F1 score of 88.12% on
the realistic dataset, outperforming related approaches. We validate
DeepTective in the wild by discovering 4 novel vulnerabilities in established
WordPress plugins.Comment: A poster version of this paper appeared as
https://doi.org/10.1145/3412841.344213
Recommended from our members
Lexical patterns, features and knowledge resources for coreference resolution in clinical notes
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download
- âŠ