8 research outputs found
Inferring Concise Specifications of APIs
Modern software relies on libraries and uses them via application programming
interfaces (APIs). Correct API usage as well as many software engineering tasks
are enabled when APIs have formal specifications. In this work, we analyze the
implementation of each method in an API to infer a formal postcondition.
Conventional wisdom is that, if one has preconditions, then one can use the
strongest postcondition predicate transformer (SP) to infer postconditions.
However, SP yields postconditions that are exponentially large, which makes
them difficult to use, either by humans or by tools. Our key idea is an
algorithm that converts such exponentially large specifications into a form
that is more concise and thus more usable. This is done by leveraging the
structure of the specifications that result from the use of SP. We applied our
technique to infer postconditions for over 2,300 methods in seven popular Java
libraries. Our technique was able to infer specifications for 75.7% of these
methods, each of which was verified using an Extended Static Checker. We also
found that 84.6% of resulting specifications were less than 1/4 page (20 lines)
in length. Our technique was able to reduce the length of SMT proofs needed for
verifying implementations by 76.7% and reduced prover execution time by 26.7%
Neural-Augmented Static Analysis of Android Communication
We address the problem of discovering communication links between
applications in the popular Android mobile operating system, an important
problem for security and privacy in Android. Any scalable static analysis in
this complex setting is bound to produce an excessive amount of
false-positives, rendering it impractical. To improve precision, we propose to
augment static analysis with a trained neural-network model that estimates the
probability that a communication link truly exists. We describe a
neural-network architecture that encodes abstractions of communicating objects
in two applications and estimates the probability with which a link indeed
exists. At the heart of our architecture are type-directed encoders (TDE), a
general framework for elegantly constructing encoders of a compound data type
by recursively composing encoders for its constituent types. We evaluate our
approach on a large corpus of Android applications, and demonstrate that it
achieves very high accuracy. Further, we conduct thorough interpretability
studies to understand the internals of the learned neural networks.Comment: Appears in Proceedings of the 2018 ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE
Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces
With the rise of machine learning, there is a great deal of interest in
treating programs as data to be fed to learning algorithms. However, programs
do not start off in a form that is immediately amenable to most off-the-shelf
learning techniques. Instead, it is necessary to transform the program to a
suitable representation before a learning technique can be applied.
In this paper, we use abstractions of traces obtained from symbolic execution
of a program as a representation for learning word embeddings. We trained a
variety of word embeddings under hundreds of parameterizations, and evaluated
each learned embedding on a suite of different tasks. In our evaluation, we
obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage
analogies extracted from the Linux kernel. In addition, we show that embeddings
learned from (mainly) semantic abstractions provide nearly triple the accuracy
of those learned from (mainly) syntactic abstractions
Exploiting implicit belief to resolve sparse usage problem in usage-based specification mining
Frameworks and libraries provide application programming interfaces (APIs) that serve as building blocks in modern software development. As APIs present the opportunity of increased productivity, it also calls for correct use to avoid buggy code. The usage-based specification mining technique has shown great promise in solving this problem through a data-driven approach. These techniques leverage the use of the API in large corpora to understand the recurring usages of the APIs and infer behavioral specifications (pre- and post-conditions) from such usages. A challenge for such technique is thus inference in the presence of insufficient usages, in terms of both frequency and richness.We refer to this as a “sparse usage problem. This thesis presents the first technique to solve the sparse usage problem in usage-based precondition mining. Our key insight is to leverage implicit beliefs to overcome sparse usage. An implicit belief (IB) is the knowledge implicitly derived from the fact about the code. An IB about a program is known implicitly to a programmer via the language’s constructs and semantics, and thus not explicitly written or specified in the code. The technical underpinnings of our new precondition mining approach include a technique to analyze the data and control flow in the program leading to API calls to infer preconditions that are implicitly present in the code corpus, a catalog of 35 code elements in total that can be used to derive implicit beliefs from a program, and empirical evaluation of all of these ideas.We have analyzed over 350 millions lines of code and 7 libraries that suffer from the sparse usage problem. Our approach realizes 6 implicit beliefs and we have observed that adding single-level context sensitivity can further improve the result of usage-based precondition mining. The result shows that we achieve overall 60% in precision and 69% in recall and the accuracy is relatively improved by 32% in precision and 78% in recall compared to base usage-based mining approach for these libraries