Search CORE

4 research outputs found

Probabilistic Naming of Functions in Stripped Binaries

Author: Anh Quynh Coseinc Nguyen
Bao Tiffany
Bourquin Martial
Chul Richard Shin Eui
Dai Hanjun
DeFreez Daniel
Egele Manuel
Farhadi Mohammad Reza
Flake Halvar
Gulwani Sumit
Hu Yikun
Kim Soomin
Livshits Benjamin
Nagarajan Vijayanand
Ng Beng Heng
Pewny Jannik
Rosenblum E.
TensorFlow Martín Abadi
UC Santa Barbra Computer Security Lab and Arizona State University SEFCOM.
Publication venue: ACSAC '20: Annual Computer Security Applications Conference
Publication date: 07/12/2020
Field of study

Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art

Crossref

UCL Discovery

Typilus: Neural Type Hints

Author: Allamanis Miltiadis
Allamanis Miltiadis
Alon Uri
Bahdanau Dzmitry
Bavishi Rohan
Bielik Pavol
Bracha Gilad
Brockschmidt Marc
Cho Kyunghyun
Contributors Spotify
Cvitkovic Milan
David Yaniv
DeFreez Daniel
DIRE
Foundation Python Software
Goodfellow Ian
Hadsell Raia
Karampatsis Rafael-Michael
Kim Yoon
Kipf Thomas N
Lacomis Jeremy
Li Yujia
Maddison Chris
Mangal Ravi
Miceli Barone Antonio Valerio
Overflow Stack
Vasic Marko
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/04/2020
Field of study

Type inference over partial contexts in dynamically typed languages is challenging. In this work, we present a graph neural network model that predicts types by probabilistically reasoning over a program's structure, names, and patterns. The network uses deep similarity learning to learn a TypeSpace -- a continuous relaxation of the discrete space of types -- and how to embed the type properties of a symbol (i.e. identifier) into it. Importantly, our model can employ one-shot learning to predict an open vocabulary of types, including rare and user-defined ones. We realise our approach in Typilus for Python that combines the TypeSpace with an optional type checker. We show that Typilus accurately predicts types. Typilus confidently predicts types for 70% of all annotatable symbols; when it predicts a type, that type optionally type checks 95% of the time. Typilus can also find incorrect type annotations; two important and popular open source libraries, fairseq and allennlp, accepted our pull requests that fixed the annotation errors Typilus discovered.Comment: Accepted to PLDI 202

arXiv.org e-Print Archive

Crossref

UCL Discovery