624 research outputs found
Using Compilation/Decompilation to Enhance Clone Detection
We study effects of compilation and decompilation to code clone detection in Java. Compilation/decompilation canonicalise syntactic changes made to source code and can be used as source code normalisation. We used NiCad to detect clones before and after decompilation in three open source software systems, JUnit, JFreeChart, and Tomcat. We filtered and compared the clones in the original and decompiled clone set and found that 1,201 clone pairs (78.7%) are common between the two sets while 326 pairs (21.3%) are only in one of the sets. A manual investigation identified 325 out of the 326 pairs as true clones. The 252 original-only clone pairs contain a single false positive while the 74 decompiled-only clone pairs are all true positives. Many clones in the original source code that are detected only after decompilation are type-3 clones that are difficult to detect due to added or deleted statements, keywords, package names; flipped if-else statements; or changed loops. We suggest to use decompilation as normalisation to compliment clone detection. By combining clones found before and after decompilation, one can achieve higher recall without losing precision
Ethical Mining – A Case Study on MSR Mining Challenges
Research in Mining Software Repositories (MSR) is research involving human subjects, as the repositories usually contain data
about developers’ interactions with the repositories. Therefore, any
research in the area needs to consider the ethics implications of
the intended activity before starting. This paper presents a discussion of the ethics implications of MSR research, using the mining
challenges from the years 2010 to 2019 as a case study to identify
the kinds of data used. It highlights problems that one may encounter in creating such datasets, and discusses ethics challenges
that may be encountered when using existing datasets, based on a
contemporary research ethics framework. We suggest that the MSR
community should increase awareness of ethics issues by openly
discussing ethics considerations in published articles
Ethics in the mining of software repositories
Research in Mining Software Repositories (MSR) is research involving human subjects, as the repositories usually contain data about developers’ and users’ interactions with the repositories and with each other. The ethics issues raised by such research therefore need to be considered before beginning. This paper presents a discussion of ethics issues that can arise in MSR research, using the mining challenges from the years 2006 to 2021 as a case study to identify the kinds of data used. On the basis of contemporary research ethics frameworks we discuss ethics challenges that may be encountered in creating and using repositories and associated datasets. We also report some results from a small community survey of approaches to ethics in MSR research. In addition, we present four case studies illustrating typical ethics issues one encounters in projects and how ethics considerations can shape projects before they commence. Based on our experience, we present some guidelines and practices that can help in considering potential ethics issues and reducing risks
Unions of slices are not slices
Many approaches to slicing rely upon the 'fact' that the union of two static slices is a valid slice. It is known that static slices constructed using program dependence graph algorithms are valid slices (Reps and Yang, 1988). However, this is not true for other forms of slicing. For example, it has been established that the union of two dynamic slices is not necessarily a valid dynamic slice (Hall, 1995). In this paper this result is extended to show that the union of two static slices is not necessarily a valid slice, based on Weiser's definition of a (static) slice. We also analyse the properties that make the union of different forms of slices a valid slice
TCTracer: Establishing test-to-code traceability links using dynamic and static techniques
Test-to-code traceability links model the relationships between test artefacts and code artefacts. When utilised during the development process, these links help developers to keep test code in sync with tested code, reducing the rate of test failures and missed faults. Test-to-code traceability links can also help developers to maintain an accurate mental model of the system, reducing the risk of architectural degradation when making changes. However, establishing and maintaining these links manually places an extra burden on developers and is error-prone. This paper presents TCTracer, an approach and implementation for the automatic establishment of test-to-code traceability links. Unlike existing work, TCTracer operates at both the method level and the class level, allowing us to establish links between tests and functions, as well as between test classes and tested classes. We improve over existing techniques by combining an ensemble of new and existing techniques that utilise both dynamic and static information and exploiting a synergistic flow of information between the method and class levels. An evaluation of TCTracer using five large, well-studied open source systems demonstrates that, on average, we can establish test-to-function links with a mean average precision (MAP) of 85% and test-class-to-class links with an MAP of 92%
A comparison of code similarity analysers
Copying and pasting of source code is a common activity in software engineering. Often, the code is not copied as it is and it may be modified for various purposes; e.g. refactoring, bug fixing, or even software plagiarism. These code modifications could affect the performance of code similarity analysers including code clone and plagiarism detectors to some certain degree. We are interested in two types of code modification in this study: pervasive modifications, i.e. transformations that may have a global effect, and local modifications, i.e. code changes that are contained in a single method or code block. We evaluate 30 code similarity detection techniques and tools using five experimental scenarios for Java source code. These are (1) pervasively modified code, created with tools for source code and bytecode obfuscation, and boiler-plate code, (2) source code normalisation through compilation and decompilation using different decompilers, (3) reuse of optimal configurations over different data sets, (4) tool evaluation using ranked-based measures, and (5) local + global code modifications. Our experimental results show that in the presence of pervasive modifications, some of the general textual similarity measures can offer similar performance to specialised code similarity tools, whilst in the presence of boiler-plate code, highly specialised source code similarity detection techniques and tools outperform textual similarity measures. Our study strongly validates the use of compilation/decompilation as a normalisation technique. Its use reduced false classifications to zero for three of the tools. Moreover, we demonstrate that optimal configurations are very sensitive to a specific data set. After directly applying optimal configurations derived from one data set to another, the tools perform poorly on the new data set. The code similarity analysers are thoroughly evaluated not only based on several well-known pair-based and query-based error measures but also on each specific type of pervasive code modification. This broad, thorough study is the largest in existence and potentially an invaluable guide for future users of similarity detection in source code
Similarity of Source Code in the Presence of Pervasive Modifications
Source code analysis to detect code cloning, code plagiarism, and code reuse suffers from the problem of pervasive code modifications, i.e. transformations that may have a global effect. We compare 30 similarity detection techniques and tools against pervasive code modifications. We evaluate the tools using two experimental scenarios for Java source code. These are (1) pervasive modifications created with tools for source code and bytecode obfuscation and (2) source code normalisation through compilation and decompilation using different decompilers. Our experimental results show that highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures. Our study strongly validates the use of compilation/decompilation as a normalisation technique. Its use reduced false classifications to zero for six of the tools. This broad, thorough study is the largest in existence and potentially an invaluable guide for future users of similarity detection in source code
A Picture Is Worth a Thousand Words: Code Clone Detection Based on Image Similarity
This paper introduces a new code clone detection
technique based on image similarity. The technique captures
visual perception of code seen by humans in an IDE by applying
syntax highlighting and images conversion on raw source code
text. We compared two similarity measures, Jaccard and earth
mover’s distance (EMD) for our image-based code clone detection
technique. Jaccard similarity offered better detection performance
than EMD. The F1 score of our technique on detecting
Java clones with pervasive code modifications is comparable
to five well-known code clone detectors: CCFinderX, Deckard,
iClones, NiCad, and Simian. A Gaussian blur filter is chosen as a
normalisation technique for type-2 and type-3 clones. We found
that blurring code images before similarity computation resulted
in higher precision and recall. The detection performance after
including the blur filter increased by 1 to 6 percent. The manual
investigation of clone pairs in three software systems revealed that
our technique, while it missed some of the true clones, could also
detect additional true clone pairs missed by NiCad
The Influence of Character Education on Positive Behavior in the Classroom
One basic goal of all educational systems should be to prepare students to be effective members of a society. For these reasons, it is imperative that school districts and educators across the nation look at developing, implementing, and teaching students basic character education traits. Our team explored the influence character education had on positive behavior in our classrooms. Specifically, our data comes from two English Language Learner classrooms containing 33 total juniors and seniors, and one traditional English class containing 25 total seniors. We collected four types of data, including pre and post-surveys, PACK Referrals, a daily observational Tally Sheet, and student interviews. Our results indicate a clear relationship between character education and student awareness of character traits. This supported our belief that character education should be developed, implemented, and taught in school districts. As a result, we recommend schools seriously consider implementing a solid character education program
Establishing Multilevel Test-to-Code Traceability Links
Test-to-code traceability links model the relationships between test artefacts and code artefacts. When utilised during the development process, these links help developers to keep test code in sync with tested code, reducing the rate of test failures and missed faults. Test-to-code traceability links can also help developers to maintain an accurate mental model of the system, reducing the risk of architectural degradation when making changes. However, establishing and maintaining these links manually places an extra burden on developers and is error-prone. This paper presents TCtracer, an approach and implementation for the automatic establishment of test-to-code traceability links. Unlike existing work, TCtracer operates at both the method level and the class level, allowing us to establish links between tests and functions, as well as between test classes and tested classes. We improve over existing techniques by combining an ensemble of new and existing techniques and exploiting a synergistic flow of information between the method and class levels. An evaluation of TCtracer using four large, well-studied open source systems demonstrates that, on average, we can establish test-to-function links with a mean average precision (MAP) of 78% and test-class-to-class links with an MAP of 93%
- …