Search CORE

281 research outputs found

Robust Parsing of Cloned Token Sequences

Author: Koschke Rainer
Riemann Ole Jan Lars
Publication venue: European Association of Software Science and Technology
Publication date: 01/03/2014
Field of study

Token-based clone detection techniques are known for theirscalability, high recall, and robustness against syntax errors andincomplete code. They, however, may yield clones that aresyntactically incomplete and they know very little about the syntacticstructure of their reported clones. Hence, their results cannotimmediately be used for automated refactorings or syntactic filtersfor relevance.This paper explores techniques of robust parsing to parse codefragments reported by token-based clone detectors to determine whetherthe clones are syntactically complete and what kind of syntacticelements they contain.This knowledge can be used to improve the precision of token-basedclone detection

Electronic Communications of the EASST (European Association of Software Science and Technology)

An Extended Stable Marriage Problem Algorithm for Clone Detection

Author: AlHakami Hosam
Chen Feng
Janicke Helge
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/01/2014
Field of study

Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

arXiv.org e-Print Archive

CiteSeerX

Detecting sequential structure

Author: Nevill-Manning Craig G.
Witten Ian H.
Publication venue
Publication date: 01/01/1995
Field of study

Programming by demonstration requires detection and analysis of sequential patterns in a user’s input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure., Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided

Research Commons@Waikato

Program Similarity Detection with Checksims

Author: Heon Matthew
Murvihill Dolan Patrick
Publication venue: Digital WPI
Publication date: 30/04/2015
Field of study

In response to growing academic dishonesty in low- level computer science and electrical and computer engineering courses, we present extit{checksims}, a similarity detector designed to highlight suspicious assignments for instructor review. We report the design rationale for the software, and describe our detection of dozens of previously undetected cases of academic dishonesty in previous classes

DigitalCommons@WPI

A Hybrid Graph Neural Network Approach for Detecting PHP Vulnerabilities

Author: Hanif Hazim
Maffeis Sergio
Rabheru Rishi
Publication venue
Publication date: 16/12/2020
Field of study

This paper presents DeepTective, a deep learning approach to detect vulnerabilities in PHP source code. Our approach implements a novel hybrid technique that combines Gated Recurrent Units and Graph Convolutional Networks to detect SQLi, XSS and OSCI vulnerabilities leveraging both syntactic and semantic information. We evaluate DeepTective and compare it to the state of the art on an established synthetic dataset and on a novel real-world dataset collected from GitHub. Experimental results show that DeepTective achieves near perfect classification on the synthetic dataset, and an F1 score of 88.12% on the realistic dataset, outperforming related approaches. We validate DeepTective in the wild by discovering 4 novel vulnerabilities in established WordPress plugins.Comment: A poster version of this paper appeared as https://doi.org/10.1145/3412841.344213

arXiv.org e-Print Archive

Recommended from our members

Lexical patterns, features and knowledge resources for coreference resolution in clinical notes

Author: Abdul Roudsari
D’Avolio
Miller
Phil Gooch
Rahman
Recasens
Rosse
Savova
Savova
Uzuner
van Deemter
Zheng
Zheng
Publication venue: 'Elsevier BV'
Publication date: 01/10/2012
Field of study

Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download

City Research Online

Elsevier - Publisher Connector

Crossref