19,792 research outputs found
AdaCCD: Adaptive Semantic Contrasts Discovery based Cross Lingual Adaptation for Code Clone Detection
Code Clone Detection, which aims to retrieve functionally similar programs
from large code bases, has been attracting increasing attention. Modern
software often involves a diverse range of programming languages. However,
current code clone detection methods are generally limited to only a few
popular programming languages due to insufficient annotated data as well as
their own model design constraints. To address these issues, we present AdaCCD,
a novel cross-lingual adaptation method that can detect cloned codes in a new
language without any annotations in that language. AdaCCD leverages
language-agnostic code representations from pre-trained programming language
models and propose an Adaptively Refined Contrastive Learning framework to
transfer knowledge from resource-rich languages to resource-poor languages. We
evaluate the cross-lingual adaptation results of AdaCCD by constructing a
multilingual code clone detection benchmark consisting of 5 programming
languages. AdaCCD achieves significant improvements over other baselines, and
it is even comparable to supervised fine-tuning.Comment: 10 page
GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench
With the emergence of Machine Learning, there has been a surge in leveraging
its capabilities for problem-solving across various domains. In the code clone
realm, the identification of type-4 or semantic clones has emerged as a crucial
yet challenging task. Researchers aim to utilize Machine Learning to tackle
this challenge, often relying on the BigCloneBench dataset. However, it's worth
noting that BigCloneBench, originally not designed for semantic clone
detection, presents several limitations that hinder its suitability as a
comprehensive training dataset for this specific purpose. Furthermore, CLCDSA
dataset suffers from a lack of reusable examples aligning with real-world
software systems, rendering it inadequate for cross-language clone detection
approaches. In this work, we present a comprehensive semantic clone and
cross-language clone benchmark, GPTCloneBench by exploiting SemanticCloneBench
and OpenAI's GPT-3 model. In particular, using code fragments from
SemanticCloneBench as sample inputs along with appropriate prompt engineering
for GPT-3 model, we generate semantic and cross-language clones for these
specific fragments and then conduct a combination of extensive manual analysis,
tool-assisted filtering, functionality testing and automated validation in
building the benchmark. From 79,928 clone pairs of GPT-3 output, we created a
benchmark with 37,149 true semantic clone pairs, 19,288 false semantic
pairs(Type-1/Type-2), and 20,770 cross-language clones across four languages
(Java, C, C#, and Python). Our benchmark is 15-fold larger than
SemanticCloneBench, has more functional code examples for software systems and
programming language support than CLCDSA, and overcomes BigCloneBench's
qualities, quantification, and language variety limitations.Comment: Accepted in 39th IEEE International Conference on Software
Maintenance and Evolution(ICSME 2023
SLACC: Simion-based Language Agnostic Code Clones
Successful cross-language clone detection could enable researchers and
developers to create robust language migration tools, facilitate learning
additional programming languages once one is mastered, and promote reuse of
code snippets over a broader codebase. However, identifying cross-language
clones presents special challenges to the clone detection problem. A lack of
common underlying representation between arbitrary languages means detecting
clones requires one of the following solutions: 1) a static analysis framework
replicated across each targeted language with annotations matching language
features across all languages, or 2) a dynamic analysis framework that detects
clones based on runtime behavior.
In this work, we demonstrate the feasibility of the latter solution, a
dynamic analysis approach called SLACC for cross-language clone detection. Like
prior clone detection techniques, we use input/output behavior to match clones,
though we overcome limitations of prior work by amplifying the number of inputs
and covering more data types; and as a result, achieve better clusters than
prior attempts. Since clusters are generated based on input/output behavior,
SLACC supports cross-language clone detection. As an added challenge, we target
a static typed language, Java, and a dynamic typed language, Python. Compared
to HitoshiIO, a recent clone detection tool for Java, SLACC retrieves 6 times
as many clusters and has higher precision (86.7% vs. 30.7%).
This is the first work to perform clone detection for dynamic typed languages
(precision = 87.3%) and the first to perform clone detection across languages
that lack a common underlying representation (precision = 94.1%). It provides a
first step towards the larger goal of scalable language migration tools.Comment: 11 Pages, 3 Figures, Accepted at ICSE 2020 technical trac
Structured Review of Code Clone Literature
This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other
A comparative analysis of web-based GIS applications using usability metrics
With the rapid expansion of the internet, Web-based Geographic Information System (WGIS) applications have gained popularity, despite the interface of the WGIS application being difficult to learn and understand because special functions are needed to manipulate the maps. Hence, it is essential to evaluate the usability of WGIS applications. Usability is an important factor in ensuring the development of quality, usable software products. On the other hand, there are a number of standards and models in the literature, each of which describes usability in terms of various set of attributes. These models are vague and difficult to understand. Therefore, the primary purpose of this study is to compare five common usability models (Shackel, Nielsen, ISO 9241 P-11, ISO 9126-1 and QUIM) to identify usability metrics that have most frequently used in the previous models. The questionnaire method and the automated usability evaluation method by using Loop11 tool were used, in order to evaluate the usability metrics for three case studies of commonly used WGIS applications as Google maps, Yahoo maps, and MapQuest. Finally, those case studies were compared and analysed based on usability metrics that have been identified. Based on a comparative study, four usability metrics (Effectiveness, Efficiency, Satisfaction and Learnability) were identified. Those usability metrics were characterized by consistent, comprehensive, not vaguely and proper to evaluate the usability of WGIS applications. In addition, there was a positive correlation between these usability metrics. The comparative analysis indicates that Effectiveness, Satisfaction and Learnability were higher, and the Efficiency was lesser by using the Loop11 tool compared to questionnaire method for the three case studies. In addition, Yahoo Maps and MapQuest have usability metrics rate lesser than Google Maps by applying two methods. Therefore, Google Maps is more usable compared to Yahoo Maps and MapQuest
- โฆ