49 research outputs found
Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces
With the rise of machine learning, there is a great deal of interest in
treating programs as data to be fed to learning algorithms. However, programs
do not start off in a form that is immediately amenable to most off-the-shelf
learning techniques. Instead, it is necessary to transform the program to a
suitable representation before a learning technique can be applied.
In this paper, we use abstractions of traces obtained from symbolic execution
of a program as a representation for learning word embeddings. We trained a
variety of word embeddings under hundreds of parameterizations, and evaluated
each learned embedding on a suite of different tasks. In our evaluation, we
obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage
analogies extracted from the Linux kernel. In addition, we show that embeddings
learned from (mainly) semantic abstractions provide nearly triple the accuracy
of those learned from (mainly) syntactic abstractions