147,510 research outputs found
MUFIN: Improving Neural Repair Models with Back-Translation
Automated program repair is the task of automatically repairing software
bugs. A promising direction in this field is self-supervised learning, a
learning paradigm in which repair models are trained without commits
representing pairs of bug/fix. In self-supervised neural program repair, those
bug/fix pairs are generated in some ways. The main problem is to generate
interesting and diverse pairs that maximize the effectiveness of training. As a
contribution to this problem, we propose to use back-translation, a technique
coming from neural machine translation. We devise and implement MUFIN, a
back-translation training technique for program repair, with specifically
designed code critics to select high-quality training samples. Our results show
that MUFIN's back-translation loop generates valuable training samples in a
fully automated, self-supervised manner, generating more than half-a-million
pairs of bug/fix. The code critic design is key because of a fundamental
trade-off between how restrictive a critic is and how many samples are
available for optimization during back-translation
sk_p: a neural program corrector for MOOCs
We present a novel technique for automatic program correction in MOOCs,
capable of fixing both syntactic and semantic errors without manual, problem
specific correction strategies. Given an incorrect student program, it
generates candidate programs from a distribution of likely corrections, and
checks each candidate for correctness against a test suite.
The key observation is that in MOOCs many programs share similar code
fragments, and the seq2seq neural network model, used in the natural-language
processing task of machine translation, can be modified and trained to recover
these fragments.
Experiment shows our scheme can correct 29% of all incorrect submissions and
out-performs state of the art approach which requires manual, problem specific
correction strategies
Towards Neural Decompilation
We address the problem of automatic decompilation, converting a program in
low-level representation back to a higher-level human-readable programming
language. The problem of decompilation is extremely important for security
researchers. Finding vulnerabilities and understanding how malware operates is
much easier when done over source code.
The importance of decompilation has motivated the construction of
hand-crafted rule-based decompilers. Such decompilers have been designed by
experts to detect specific control-flow structures and idioms in low-level code
and lift them to source level. The cost of supporting additional languages or
new language features in these models is very high.
We present a novel approach to decompilation based on neural machine
translation. The main idea is to automatically learn a decompiler from a given
compiler. Given a compiler from a source language S to a target language T ,
our approach automatically trains a decompiler that can translate (decompile) T
back to S . We used our framework to decompile both LLVM IR and x86 assembly to
C code with high success rates. Using our LLVM and x86 instantiations, we were
able to successfully decompile over 97% and 88% of our benchmarks respectively
SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
This paper presents a novel end-to-end approach to program repair based on
sequence-to-sequence learning. We devise, implement, and evaluate a system,
called SequenceR, for fixing bugs based on sequence-to-sequence learning on
source code. This approach uses the copy mechanism to overcome the unlimited
vocabulary problem that occurs with big code. Our system is data-driven; we
train it on 35,578 samples, carefully curated from commits to open-source
repositories. We evaluate it on 4,711 independent real bug fixes, as well on
the Defects4J benchmark used in program repair research. SequenceR is able to
perfectly predict the fixed line for 950/4711 testing samples, and find correct
patches for 14 bugs in Defects4J. It captures a wide range of repair operators
without any domain-specific top-down design.Comment: 21 pages, 15 figure
TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks
Text generation is of particular interest in many NLP applications such as
machine translation, language modeling, and text summarization. Generative
adversarial networks (GANs) achieved a remarkable success in high quality image
generation in computer vision,and recently, GANs have gained lots of interest
from the NLP community as well. However, achieving similar success in NLP would
be more challenging due to the discrete nature of text. In this work, we
introduce a method using knowledge distillation to effectively exploit GAN
setup for text generation. We demonstrate how autoencoders (AEs) can be used
for providing a continuous representation of sentences, which is a smooth
representation that assign non-zero probabilities to more than one word. We
distill this representation to train the generator to synthesize similar smooth
representations. We perform a number of experiments to validate our idea using
different datasets and show that our proposed approach yields better
performance in terms of the BLEU score and Jensen-Shannon distance (JSD)
measure compared to traditional GAN-based text generation approaches without
pre-training.Comment: arXiv admin note: text overlap with arXiv:1904.0729
Spatial Phase-Sweep: Increasing temporal resolution of transient imaging using a light source array
Transient imaging or light-in-flight techniques capture the propagation of an
ultra-short pulse of light through a scene, which in effect captures the
optical impulse response of the scene. Recently, it has been shown that we can
capture transient images using commercially available Time-of-Flight (ToF)
systems such as Photonic Mixer Devices (PMD). In this paper, we propose
`spatial phase-sweep', a technique that exploits the speed of light to increase
the temporal resolution beyond the 100 picosecond limit imposed by current
electronics. Spatial phase-sweep uses a linear array of light sources with
spatial separation of about 3 mm between them, thereby resulting in a time
shift of about 10 picoseconds, which translates into 100 Gfps of transient
imaging in theory. We demonstrate a prototype and transient imaging results
using spatial phase-sweep
Bug Searching in Smart Contract
With the frantic development of smart contracts on the Ethereum platform, its
market value has also climbed. In 2016, people were shocked by the loss of
nearly $50 million in cryptocurrencies from the DAO reentrancy attack. Due to
the tremendous amount of money flowing in smart contracts, its security has
attracted much attention of researchers. In this paper, we investigated several
common smart contract vulnerabilities and analyzed their possible scenarios and
how they may be exploited. Furthermore, we survey the smart contract
vulnerability detection tools for the Ethereum platform in recent years. We
found that these tools have similar prototypes in software vulnerability
detection technology. Moreover, for the features of public distribution systems
such as Ethereum, we present the new challenges that these software
vulnerability detection technologies face.Comment: 8 pages, 9 figure
Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs
In practice, it is common to find oneself with far too little text data to
train a deep neural network. This "Big Data Wall" represents a challenge for
minority language communities on the Internet, organizations, laboratories and
companies that compete the GAFAM (Google, Amazon, Facebook, Apple, Microsoft).
While most of the research effort in text data augmentation aims on the
long-term goal of finding end-to-end learning solutions, which is equivalent to
"using neural networks to feed neural networks", this engineering work focuses
on the use of practical, robust, scalable and easy-to-implement data
augmentation pre-processing techniques similar to those that are successful in
computer vision. Several text augmentation techniques have been experimented.
Some existing ones have been tested for comparison purposes such as noise
injection or the use of regular expressions. Others are modified or improved
techniques like lexical replacement. Finally more innovative ones, such as the
generation of paraphrases using back-translation or by the transformation of
syntactic trees, are based on robust, scalable, and easy-to-use NLP Cloud APIs.
All the text augmentation techniques studied, with an amplification factor of
only 5, increased the accuracy of the results in a range of 4.3% to 21.6%, with
significant statistical fluctuations, on a standardized task of text polarity
prediction. Some standard deep neural network architectures were tested: the
multilayer perceptron (MLP), the long short-term memory recurrent network
(LSTM) and the bidirectional LSTM (biLSTM). Classical XGBoost algorithm has
been tested with up to 2.5% improvements.Comment: 33 pages, 25 figure
A Principled Approach Towards Symbolic Geometric Constraint Satisfaction
An important problem in geometric reasoning is to find the configuration of a
collection of geometric bodies so as to satisfy a set of given constraints.
Recently, it has been suggested that this problem can be solved efficiently by
symbolically reasoning about geometry. This approach, called degrees of freedom
analysis, employs a set of specialized routines called plan fragments that
specify how to change the configuration of a set of bodies to satisfy a new
constraint while preserving existing constraints. A potential drawback, which
limits the scalability of this approach, is concerned with the difficulty of
writing plan fragments. In this paper we address this limitation by showing how
these plan fragments can be automatically synthesized using first principles
about geometric bodies, actions, and topology.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
A Quick Introduction to Functional Verification of Array-Intensive Programs
Array-intensive programs are often amenable to parallelization across many
cores on a single machine as well as scaling across multiple machines and hence
are well explored, especially in the domain of high-performance computing.
These programs typically undergo loop transformations and arithmetic
transformations in addition to parallelizing transformations. Although a lot of
effort has been invested in improving parallelizing compilers, experienced
programmers still resort to hand-optimized transformations which is typically
followed by careful tuning of the transformed program to finally obtain the
optimized program. Therefore, it is critical to verify that the functional
correctness of an original sequential program is not sacrificed during the
process of optimization. In this paper, we cover important literature on
functional verification of array-intensive programs which we believe can be a
good starting point for one interested in this field
- …