4 research outputs found
Probabilistic Naming of Functions in Stripped Binaries
Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art
Typilus: Neural Type Hints
Type inference over partial contexts in dynamically typed languages is
challenging. In this work, we present a graph neural network model that
predicts types by probabilistically reasoning over a program's structure,
names, and patterns. The network uses deep similarity learning to learn a
TypeSpace -- a continuous relaxation of the discrete space of types -- and how
to embed the type properties of a symbol (i.e. identifier) into it.
Importantly, our model can employ one-shot learning to predict an open
vocabulary of types, including rare and user-defined ones. We realise our
approach in Typilus for Python that combines the TypeSpace with an optional
type checker. We show that Typilus accurately predicts types. Typilus
confidently predicts types for 70% of all annotatable symbols; when it predicts
a type, that type optionally type checks 95% of the time. Typilus can also find
incorrect type annotations; two important and popular open source libraries,
fairseq and allennlp, accepted our pull requests that fixed the annotation
errors Typilus discovered.Comment: Accepted to PLDI 202