Search CORE

147,510 research outputs found

MUFIN: Improving Neural Repair Models with Back-Translation

Author: Ferreira João F.
Monperrus Martin
Silva André
Ye He
Publication venue
Publication date: 05/04/2023
Field of study

Automated program repair is the task of automatically repairing software bugs. A promising direction in this field is self-supervised learning, a learning paradigm in which repair models are trained without commits representing pairs of bug/fix. In self-supervised neural program repair, those bug/fix pairs are generated in some ways. The main problem is to generate interesting and diverse pairs that maximize the effectiveness of training. As a contribution to this problem, we propose to use back-translation, a technique coming from neural machine translation. We devise and implement MUFIN, a back-translation training technique for program repair, with specifically designed code critics to select high-quality training samples. Our results show that MUFIN's back-translation loop generates valuable training samples in a fully automated, self-supervised manner, generating more than half-a-million pairs of bug/fix. The code critic design is key because of a fundamental trade-off between how restrictive a critic is and how many samples are available for optimization during back-translation

arXiv.org e-Print Archive

sk_p: a neural program corrector for MOOCs

Author: Abadi Martn
Kiros Ryan
Lafferty John
Mikolov Tomas
Pennington Jeffrey
Singh Gursimran
Publication venue
Publication date: 11/07/2016
Field of study

We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite. The key observation is that in MOOCs many programs share similar code fragments, and the seq2seq neural network model, used in the natural-language processing task of machine translation, can be modified and trained to recover these fragments. Experiment shows our scheme can correct 29% of all incorrect submissions and out-performs state of the art approach which requires manual, problem specific correction strategies

arXiv.org e-Print Archive

Towards Neural Decompilation

Author: Goldberg Yoav
Katz Omer
Olshaker Yuval
Yahav Eran
Publication venue
Publication date: 20/05/2019
Field of study

We address the problem of automatic decompilation, converting a program in low-level representation back to a higher-level human-readable programming language. The problem of decompilation is extremely important for security researchers. Finding vulnerabilities and understanding how malware operates is much easier when done over source code. The importance of decompilation has motivated the construction of hand-crafted rule-based decompilers. Such decompilers have been designed by experts to detect specific control-flow structures and idioms in low-level code and lift them to source level. The cost of supporting additional languages or new language features in these models is very high. We present a novel approach to decompilation based on neural machine translation. The main idea is to automatically learn a decompiler from a given compiler. Given a compiler from a source language S to a target language T , our approach automatically trains a decompiler that can translate (decompile) T back to S . We used our framework to decompile both LLVM IR and x86 assembly to C code with high success rates. Using our LLVM and x86 instantiations, we were able to successfully decompile over 97% and 88% of our benchmarks respectively

arXiv.org e-Print Archive

SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Author: Chen Zimin
Kommrusch Steve
Monperrus Martin
Poshyvanyk Denys
Pouchet Louis-Noël
Tufano Michele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2019
Field of study

This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 samples, carefully curated from commits to open-source repositories. We evaluate it on 4,711 independent real bug fixes, as well on the Defects4J benchmark used in program repair research. SequenceR is able to perfectly predict the fixed line for 950/4711 testing samples, and find correct patches for 14 bugs in Defects4J. It captures a wide range of repair operators without any domain-specific top-down design.Comment: 21 pages, 15 figure

arXiv.org e-Print Archive

TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks

Author: Haidar Md. Akmal
Rezagholizadeh Mehdi
Publication venue
Publication date: 23/04/2019
Field of study

Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization. Generative adversarial networks (GANs) achieved a remarkable success in high quality image generation in computer vision,and recently, GANs have gained lots of interest from the NLP community as well. However, achieving similar success in NLP would be more challenging due to the discrete nature of text. In this work, we introduce a method using knowledge distillation to effectively exploit GAN setup for text generation. We demonstrate how autoencoders (AEs) can be used for providing a continuous representation of sentences, which is a smooth representation that assign non-zero probabilities to more than one word. We distill this representation to train the generator to synthesize similar smooth representations. We perform a number of experiments to validate our idea using different datasets and show that our proposed approach yields better performance in terms of the BLEU score and Jensen-Shannon distance (JSD) measure compared to traditional GAN-based text generation approaches without pre-training.Comment: arXiv admin note: text overlap with arXiv:1904.0729

arXiv.org e-Print Archive

Spatial Phase-Sweep: Increasing temporal resolution of transient imaging using a light source array

Author: Mitra Kaushik
Pediredla Adithya Kumar
Tadano Ryuichi
Veeraraghavan Ashok
Publication venue
Publication date: 21/12/2015
Field of study

Transient imaging or light-in-flight techniques capture the propagation of an ultra-short pulse of light through a scene, which in effect captures the optical impulse response of the scene. Recently, it has been shown that we can capture transient images using commercially available Time-of-Flight (ToF) systems such as Photonic Mixer Devices (PMD). In this paper, we propose `spatial phase-sweep', a technique that exploits the speed of light to increase the temporal resolution beyond the 100 picosecond limit imposed by current electronics. Spatial phase-sweep uses a linear array of light sources with spatial separation of about 3 mm between them, thereby resulting in a time shift of about 10 picoseconds, which translates into 100 Gfps of transient imaging in theory. We demonstrate a prototype and transient imaging results using spatial phase-sweep

arXiv.org e-Print Archive

Bug Searching in Smart Contract

Author: Feng Xiaotao
Wang Qin
Wen Sheng
Zhu Xiaogang
Publication venue
Publication date: 02/05/2019
Field of study

With the frantic development of smart contracts on the Ethereum platform, its market value has also climbed. In 2016, people were shocked by the loss of nearly $50 million in cryptocurrencies from the DAO reentrancy attack. Due to the tremendous amount of money flowing in smart contracts, its security has attracted much attention of researchers. In this paper, we investigated several common smart contract vulnerabilities and analyzed their possible scenarios and how they may be exploited. Furthermore, we survey the smart contract vulnerability detection tools for the Ethereum platform in recent years. We found that these tools have similar prototypes in software vulnerability detection technology. Moreover, for the features of public distribution systems such as Ethereum, we present the new challenges that these software vulnerability detection technologies face.Comment: 8 pages, 9 figure

arXiv.org e-Print Archive

Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs

Author: Coulombe Claude
Publication venue
Publication date: 04/12/2018
Field of study

In practice, it is common to find oneself with far too little text data to train a deep neural network. This "Big Data Wall" represents a challenge for minority language communities on the Internet, organizations, laboratories and companies that compete the GAFAM (Google, Amazon, Facebook, Apple, Microsoft). While most of the research effort in text data augmentation aims on the long-term goal of finding end-to-end learning solutions, which is equivalent to "using neural networks to feed neural networks", this engineering work focuses on the use of practical, robust, scalable and easy-to-implement data augmentation pre-processing techniques similar to those that are successful in computer vision. Several text augmentation techniques have been experimented. Some existing ones have been tested for comparison purposes such as noise injection or the use of regular expressions. Others are modified or improved techniques like lexical replacement. Finally more innovative ones, such as the generation of paraphrases using back-translation or by the transformation of syntactic trees, are based on robust, scalable, and easy-to-use NLP Cloud APIs. All the text augmentation techniques studied, with an amplification factor of only 5, increased the accuracy of the results in a range of 4.3% to 21.6%, with significant statistical fluctuations, on a standardized task of text polarity prediction. Some standard deep neural network architectures were tested: the multilayer perceptron (MLP), the long short-term memory recurrent network (LSTM) and the bidirectional LSTM (biLSTM). Classical XGBoost algorithm has been tested with up to 2.5% improvements.Comment: 33 pages, 25 figure

arXiv.org e-Print Archive

A Principled Approach Towards Symbolic Geometric Constraint Satisfaction

Author: Bhansali S.
Hoar T. J.
Kramer G. A.
Publication venue
Publication date: 01/01/1996
Field of study

An important problem in geometric reasoning is to find the configuration of a collection of geometric bodies so as to satisfy a set of given constraints. Recently, it has been suggested that this problem can be solved efficiently by symbolically reasoning about geometry. This approach, called degrees of freedom analysis, employs a set of specialized routines called plan fragments that specify how to change the configuration of a set of bodies to satisfy a new constraint while preserving existing constraints. A potential drawback, which limits the scalability of this approach, is concerned with the difficulty of writing plan fragments. In this paper we address this limitation by showing how these plan fragments can be automatically synthesized using first principles about geometric bodies, actions, and topology.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

A Quick Introduction to Functional Verification of Array-Intensive Programs

Author: Banerjee Kunal
Karfa Chandan
Publication venue
Publication date: 22/05/2019
Field of study

Array-intensive programs are often amenable to parallelization across many cores on a single machine as well as scaling across multiple machines and hence are well explored, especially in the domain of high-performance computing. These programs typically undergo loop transformations and arithmetic transformations in addition to parallelizing transformations. Although a lot of effort has been invested in improving parallelizing compilers, experienced programmers still resort to hand-optimized transformations which is typically followed by careful tuning of the transformed program to finally obtain the optimized program. Therefore, it is critical to verify that the functional correctness of an original sequential program is not sacrificed during the process of optimization. In this paper, we cover important literature on functional verification of array-intensive programs which we believe can be a good starting point for one interested in this field

arXiv.org e-Print Archive