1 research outputs found
A Neural-based Program Decompiler
Reverse engineering of binary executables is a critical problem in the
computer security domain. On the one hand, malicious parties may recover
interpretable source codes from the software products to gain commercial
advantages. On the other hand, binary decompilation can be leveraged for code
vulnerability analysis and malware detection. However, efficient binary
decompilation is challenging. Conventional decompilers have the following major
limitations: (i) they are only applicable to specific source-target language
pair, hence incurs undesired development cost for new language tasks; (ii)
their output high-level code cannot effectively preserve the correct
functionality of the input binary; (iii) their output program does not capture
the semantics of the input and the reversed program is hard to interpret. To
address the above problems, we propose Coda, the first end-to-end neural-based
framework for code decompilation. Coda decomposes the decompilation task into
two key phases: First, Coda employs an instruction type-aware encoder and a
tree decoder for generating an abstract syntax tree (AST) with attention
feeding during the code sketch generation stage. Second, Coda then updates the
code sketch using an iterative error correction machine guided by an ensembled
neural error predictor. By finding a good approximate candidate and then fixing
it towards perfect, Coda achieves superior performance compared to baseline
approaches. We assess Coda's performance with extensive experiments on various
benchmarks. Evaluation results show that Coda achieves an average of 82\%
program recovery accuracy on unseen binary samples, where the state-of-the-art
decompilers yield 0\% accuracy. Furthermore, Coda outperforms the
sequence-to-sequence model with attention by a margin of 70\% program accuracy