4 research outputs found

    Probabilistic Naming of Functions in Stripped Binaries

    Get PDF
    Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art

    Structural Comparison of Executable Objects

    Get PDF
    Abstract: A method to heuristically construct an isomorphism between the sets of functions in two similar but differing versions of the same executable file is presented. Such an isomorphism has multiple practical applications, specifically the ability to detect programmatic changes between the two executable versions. Moreover, information (function names) which is available for one of the two versions can also be made available for the other. A framework implementing the described methods is presented, along with empirical data about its performance when used to analyze patches to recent security vulnerabilities. As a more practical example, a security update which fixes a critical vulnerability in an H.323 parsing component is analyzed, the relevant vulnerability extracted and the implications of the vulnerability and the fix discussed.

    Towards Paving the Way for Large-Scale Windows Malware Analysis: Generic Binary Unpacking with Orders-of-Magnitude Performance Boost

    No full text
    International audienceBinary packing, encoding binary code prior to execution and decoding them at run time, is the most common obfuscation adopted by malware authors to camouflage malicious code. Especially, most packers recover the original code by going through a set of "written-then-executed" layers, which renders determining the end of the unpacking increasingly difficult. Many generic binary unpacking approaches have been proposed to extract packed binaries without the prior knowledge of packers. However, the high runtime overhead and lack of anti-analysis resistance have severely limited their adoptions. Over the past two decades, packed malware is always a veritable challenge to anti-malware landscape. This paper revisits the long-standing binary unpacking problem from a new angle: packers consistently obfuscate the standard use of API calls. Our in-depth study on an enormous variety of Windows malware packers at present leads to a common property: malware's Import Address Table (IAT), which acts as a lookup table for dynamically linked API calls, is typically erased by packers for further obfuscation; and then unpacking routine, like a custom dynamic loader, will reconstruct IAT before original code resumes execution. During a packed malware execution, if an API is invoked through looking up a rebuilt IAT, it indicates that the original payload has been restored. This insight motivates us to design an efficient unpacking approach, called BinUnpack. Compared to the previous methods that suffer from multiple "written-then-executed" unpacking layers, BinUnpack is free from tedious memory access monitoring, and therefore it introduces very small runtime overhead. To defeat a variety of ever-evolving evasion tricks, we design BinUnpack's API monitor module via a novel kernel-level DLL hijacking technique. We have evaluated BinUnpack's efficacy extensively with more than 238K packed malware and multiple Windows utilities. BinUnpack's success rate is significantly better than that of existing tools with several orders of magnitude performance boost. Our study demonstrates that BinUnpack can be applied to speeding up large-scale malware analysis
    corecore