Search CORE

8 research outputs found

Memorization and Generalization in Neural Code Intelligence Models

Author: Alipour Mohammad Amin
Hellendoorn Vincent J.
Hussain Aftab
Rabin Md Rafiqul Islam
Publication venue
Publication date: 16/06/2021
Field of study

Deep Neural Networks (DNN) are increasingly commonly used in software engineering and code intelligence tasks. These are powerful tools that are capable of learning highly generalizable patterns from large datasets through millions of parameters. At the same time, training DNNs means walking a knife's edges, because their large capacity also renders them prone to memorizing data points. While traditionally thought of as an aspect of over-training, recent work suggests that the memorization risk manifests especially strongly when the training datasets are noisy and memorization is the only recourse. Unfortunately, most code intelligence tasks rely on rather noise-prone and repetitive data sources, such as GitHub, which, due to their sheer size, cannot be manually inspected and evaluated. We evaluate the memorization and generalization tendencies in neural code intelligence models through a case study across several benchmarks and model families by leveraging established approaches from other fields that use DNNs, such as introducing targeted noise into the training dataset. In addition to reinforcing prior general findings about the extent of memorization in DNNs, our results shed light on the impact of noisy dataset in training.Comment: manuscript in preparatio

arXiv.org e-Print Archive

Revisiting test smells in automatically generated tests : limitations, pitfalls, and opportunities

Author: Fraser Gordon
Hellendoorn Vincent J.
Panichella Annibale
Panichella Sebastiano
Sawant Anand Ashok
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of "test smells", a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts

ZHAW digitalcollection

A Systematic Evaluation of Large Language Models of Code

Author: Alon Uri
Hellendoorn Vincent J.
Neubig Graham
Xu Frank F.
Publication venue
Publication date: 04/05/2022
Field of study

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not publicly available, leaving many questions about their model and data design decisions. We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages. Although Codex itself is not open-source, we find that existing open-source models do achieve close results in some programming languages, although targeted mainly for natural language modeling. We further identify an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code. We release a new model, PolyCoder, with 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine. In the C programming language, PolyCoder outperforms all models including Codex. Our trained models are open-source and publicly available at https://github.com/VHellendoorn/Code-LMs, which enables future research and application in this area.Comment: DL4C@ICLR 2022, and MAPS@PLDI 202

arXiv.org e-Print Archive

Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

Author: Fraser Gordon (author)
Hellendoorn Vincent J. (author)
Panichella A. (author)
Panichella Sebastiano (author)
Sawant Anand Ashok (author)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of “test smells”, a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts.Virtual/online event due to COVID-19Software Engineerin

TU Delft Repository

My NSFW video has partial occlusion: deepfakes and the technological production of non-consensual pornography

Author: Citron Danielle Keats
Demšar Janez
Fahy Thomas.
Gousios Georgios
Hellendoorn Vincent J.
Henry Nicola
Leavitt Alex.
Lenhart Amanda
Nakamura Lisa.
Nakamura Lisa.
R Core Team
Salter Michael
Serebrenik Alexander.
Soto Mauricio
Tsay Jason
Vasilescu Bogdan
Vasilescu Bogdan
Vasilescu Bogdan
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Learning semantic program embeddings with graph interval neural network

Author: Allamanis Miltiadis
Alon Uri
Bahdanau Dzmitry
Berdine Josh
Cousot P.
Dinella Elizabeth
Fernandes Patrick
Gilmer Justin
Gupta Rahul
Hellendoorn Vincent J.
Jiang L.
Li Yujia
Maddison Chris
Nguyen Tung Thanh
Pawlak Renaud
Saha Ripon
Vasic Marko
Wang Ke
Wang Ke
Wei Jiayi
Weiser Mark
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref