167 research outputs found
Neural-Augmented Static Analysis of Android Communication
We address the problem of discovering communication links between
applications in the popular Android mobile operating system, an important
problem for security and privacy in Android. Any scalable static analysis in
this complex setting is bound to produce an excessive amount of
false-positives, rendering it impractical. To improve precision, we propose to
augment static analysis with a trained neural-network model that estimates the
probability that a communication link truly exists. We describe a
neural-network architecture that encodes abstractions of communicating objects
in two applications and estimates the probability with which a link indeed
exists. At the heart of our architecture are type-directed encoders (TDE), a
general framework for elegantly constructing encoders of a compound data type
by recursively composing encoders for its constituent types. We evaluate our
approach on a large corpus of Android applications, and demonstrate that it
achieves very high accuracy. Further, we conduct thorough interpretability
studies to understand the internals of the learned neural networks.Comment: Appears in Proceedings of the 2018 ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE
Learning Natural Coding Conventions
Every programmer has a characteristic style, ranging from preferences about
identifier naming to preferences about object relationships and design
patterns. Coding conventions define a consistent syntactic style, fostering
readability and hence maintainability. When collaborating, programmers strive
to obey a project's coding conventions. However, one third of reviews of
changes contain feedback about coding conventions, indicating that programmers
do not always follow them and that project members care deeply about adherence.
Unfortunately, programmers are often unaware of coding conventions because
inferring them requires a global view, one that aggregates the many local
decisions programmers make and identifies emergent consensus on style. We
present NATURALIZE, a framework that learns the style of a codebase, and
suggests revisions to improve stylistic consistency. NATURALIZE builds on
recent work in applying statistical natural language processing to source code.
We apply NATURALIZE to suggest natural identifier names and formatting
conventions. We present four tools focused on ensuring natural code during
development and release management, including code review. NATURALIZE achieves
94% accuracy in its top suggestions for identifier names and can even transfer
knowledge about conventions across projects, leveraging a corpus of 10,968 open
source projects. We used NATURALIZE to generate 18 patches for 5 open source
projects: 14 were accepted
- …
