Search CORE

74 research outputs found

Natural Language is a Programming Language: Applying Natural Language Processing to Software Development

Author: Ernst Michael D.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 2nd Summit on Advances in Programming Languages (SNAPL 2017)
Publication date: 01/01/2017
Field of study

A powerful, but limited, way to view software is as source code alone. Treating a program as a sequence of instructions enables it to be formalized and makes it amenable to mathematical techniques such as abstract interpretation and model checking. A program consists of much more than a sequence of instructions. Developers make use of test cases, documentation, variable names, program structure, the version control repository, and more. I argue that it is time to take the blinders off of software analysis tools: tools should use all these artifacts to deduce more powerful and useful information about the program. Researchers are beginning to make progress towards this vision. This paper gives, as examples, four results that find bugs and generate code by applying natural language processing techniques to software artifacts. The four techniques use as input error messages, variable names, procedure documentation, and user questions. They use four different NLP techniques: document similarity, word semantics, parse trees, and neural networks. The initial results suggest that this is a promising avenue for future work

Dagstuhl Research Online Publication Server

Structured Review of Code Clone Literature

Author: Hordijk Wiebe
Ponisio María Laura
Wieringa Roel
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

University of Twente Research Information

Tracking Your Changes: A Language-Independent Approach

Author: Gerardo Canfora
Luigi Cerulo
Massimiliano Di Penta
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Code reviewer intelligent prediction in open source industrial software project

Author: Huang Xuechun
Liao Zhifang
Yu Song
Zhang Bolin
Zhang Yan
Publication venue
Publication date: 23/04/2023
Field of study

ResearchOnline@GCU

Graph Neural Networks For Mapping Variables Between Programs -- Extended Version

Author: Janota Mikoláš
Manquinho Vasco
Orvalho Pedro
Piepenbrock Jelle
Publication venue
Publication date: 24/07/2023
Field of study

Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.Comment: Extended version of "Graph Neural Networks For Mapping Variables Between Programs", paper accepted at ECAI 2023. Github: https://github.com/pmorvalho/ecai23-GNNs-for-mapping-variables-between-programs. 11 pages, 5 figures, 4 tables and 3 listing

arXiv.org e-Print Archive

An Empirical Study of a Hybrid Code Clone Detection Approach on Java Byte Code

Author: Ghosh Aritra
Lee Young
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 05/01/2017
Field of study

Code clones increase the complexity of the system;therefore the software maintenance costs. Code clonedetection techniques have been proposed and evaluated basedon metric value and runtime evaluations. But in the existingmethods, many false positive clones are detected. In thispaper, we suggest a hybrid approach combining ProgramDependence Graph-based technique with Metric-basedtechnique to improve the precision of clone detection. Weconduct a case study on two open source code Java projectssuch as Eclipse-ant and Eclipse-JDT core to show the effectiveness of our tool. The application of this hybrid technique is then compared with the existing clone detection technique, CloneDR. The result shows that our tool increases the performance in precision, recall, false positive and false negative compared to CloneDR

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

Author: Bui Nghi D. Q.
Gotmare Akhilesh Deepak
Hoi Steven C. H.
Le Hung
Li Junnan
Wang Yue
Publication venue
Publication date: 31/05/2023
Field of study

Code intelligence plays a key role in transforming modern software engineering. Recently, deep learning-based models, especially Transformer-based large language models (LLMs), have demonstrated remarkable potential in tackling these tasks by leveraging massive open-source code data and programming language features. However, the development and deployment of such models often require expertise in both machine learning and software engineering, creating a barrier for the model adoption. In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. Following the principles of modular design and extensible framework, we design CodeTF with a unified interface to enable rapid access and development across different types of models, datasets and tasks. Our library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes. In this paper, we describe the design principles, the architecture, key modules and components, and compare with other related library tools. Finally, we hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering, providing a comprehensive open-source solution for developers, researchers, and practitioners.Comment: Ongoing work - Draft Previe

arXiv.org e-Print Archive

Software Test Case Generation Tools and Techniques: A Review

Author: Abhishek Singh Verma
Ankur Choudhary
Shailesh Tiwari
Publication venue: 'International Journal of Mathematical, Engineering and Management Sciences plus Mangey Ram'
Publication date: 01/04/2023
Field of study

Software Industry is evolving at a very fast pace since last two decades. Many software developments, testing and test case generation approaches have evolved in last two decades to deliver quality products and services. Testing plays a vital role to ensure the quality and reliability of software products. In this paper authors attempted to conduct a systematic study of testing tools and techniques. Six most popular e-resources called IEEE, Springer, Association for Computing Machinery (ACM), Elsevier, Wiley and Google Scholar to download 738 manuscripts out of which 125 were selected to conduct the study. Out of 125 manuscripts selected, a good number approx. 79% are from reputed journals and around 21% are from good conference of repute. Testing tools discussed in this paper have broadly been divided into five different categories: open source, academic and research, commercial, academic and open source, and commercial & open source. The paper also discusses several benchmarked datasets viz. Evosuite 10, SF100 Corpus, Defects4J repository, Neo4j, JSON, Mocha JS, and Node JS to name a few. Aim of this paper is to make the researchers aware of the various test case generation tools and techniques introduced in the last 11 years with their salient features

Directory of Open Access Journals

Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?

Author: Chan Aaron
Elkamhawy Mohamed
Helyar Alec
Kamal Eslam
Kharkar Anant
Moghaddam Roshanak Zilouchian
Mohylevskyy Yevhen
Sundaresan Neel
Publication venue
Publication date: 22/05/2023
Field of study

Software vulnerabilities bear enterprises significant costs. Despite extensive efforts in research and development of software vulnerability detection methods, uncaught vulnerabilities continue to put software owners and users at risk. Many current vulnerability detection methods require that code snippets can compile and build before attempting detection. This, unfortunately, introduces a long latency between the time a vulnerability is injected to the time it is removed, which can substantially increases the cost of fixing a vulnerability. We recognize that the current advances in machine learning can be used to detect vulnerable code patterns on syntactically incomplete code snippets as the developer is writing the code at EditTime. In this paper we present a practical system that leverages deep learning on a large-scale data set of vulnerable code patterns to learn complex manifestations of more than 250 vulnerability types and detect vulnerable code patterns at EditTime. We discuss zero-shot, few-shot, and fine-tuning approaches on state of the art pre-trained Large Language Models (LLMs). We show that in comparison with state of the art vulnerability detection models our approach improves the state of the art by 10%. We also evaluate our approach to detect vulnerability in auto-generated code by code LLMs. Evaluation on a benchmark of high-risk code scenarios shows a reduction of up to 90% vulnerability reduction

arXiv.org e-Print Archive