1,296 research outputs found

    Linear Obfuscation to Combat Symbolic Execution

    Get PDF

    Hiding in the Particles: When Return-Oriented Programming Meets Program Obfuscation

    Full text link
    Largely known for attack scenarios, code reuse techniques at a closer look reveal properties that are appealing also for program obfuscation. We explore the popular return-oriented programming paradigm under this light, transforming program functions into ROP chains that coexist seamlessly with the surrounding software stack. We show how to build chains that can withstand popular static and dynamic deobfuscation approaches, evaluating the robustness and overheads of the design over common programs. The results suggest a significant amount of computational resources would be required to carry a deobfuscation attack for secret finding and code coverage goals.Comment: Published in the proceedings of DSN'21 (51st IEEE/IFIP Int. Conf. on Dependable Systems and Networks). Code and BibTeX entry available at https://github.com/pietroborrello/raindro

    DETECTION, DIAGNOSIS AND MITIGATION OF MALICIOUS JAVASCRIPT WITH ENRICHED JAVASCRIPT EXECUTIONS

    Get PDF
    Malicious JavaScript has become an important attack vector for software exploitation attacks and imposes a severe threat to computer security. In particular, three major class of problems, malware detection, exploit diagnosis, and exploits mitigation, bring considerable challenges to security researchers. Although a lot of research efforts have been made to address these threats, they have fundamental limitations and thus cannot solve the problems. Existing analysis techniques fall into two general categories: static analysis and dynamic analysis. Static analysis tends to produce inaccurate results (both false positive and false negative) and is vulnerable to a wide series of obfuscation techniques. Thus, dynamic analysis is constantly gaining popularity for exposing the typical features of malicious JavaScript. However, existing dynamic analysis techniques possess limitations such as limited code coverage and incomplete environment setup, leaving a broad attack surface for evading the detection. Once a zero-day exploit is captured, it is critical to quickly pinpoint the JavaScript statements that uniquely characterize the exploit and the payload location in the exploit. However, the current diagnosis techniques are inadequate because they approach the problem either from a JavaScript perspective and fail to account for “implicit” data flow invisible at JavaScript level, or from a binary execution perspective and fail to present the JavaScript level view of exploit. Although software vendors have deployed techniques like ASLR, sandbox, etc. to mitigate JavaScript exploits, hacking contests (e.g.,PWN2OWN, GeekPWN) have demonstrated that the latest software (e.g., Chrome, IE, Edge, Safari) can still be exploited. An ideal JavaScript exploit mitigation solution should be flexible and allow for deployment without requiring code changes. To combat malicious JavaScript, this dissertation addresses these problems through enriched executions, which explore arbitrary paths for detection, preserve JS-binary semantics for diagnosis, and perturbs memory with chaff code for mitigation. Firstly, JSForce, a forced execution engine for JavaScript, is proposed and developed to improve the detection results of current malicious JavaScript detection techniques. It drives an arbitrary JavaScript snippet to execute along different paths without any input or environment setup. While increasing code coverage, JSForce can tolerate invalid object accesses while introducing no runtime errors during execution. Secondly, JScalpel, a system that utilizes the JavaScript context information from the JavaScript level to perform context-aware binary analysis, is presented for JavaScript exploit diagnosis. In essence, it performs JS-Binary analysis to (1) generate a minimized exploit script, which in turn helps to generate a signature for the exploit, and (2) precisely locate the payload within the exploit. It replaces the malicious payload with a friendly payload and generates a PoV for the exploit. Thirdly, ChaffyScript, a vulnerability-agnostic mitigation system, is introduced to block JavaScript exploits via undermining the memory preparation stage. Specifically, given suspicious JavaScript, ChaffyScript rewrites the code to insert memory perturbation code, and then generates semantically-equivalent code. JavaScript exploits will fail as a result of unexpected memory states introduced by memory perturbation code, while the benign JavaScript still behaves as expected since the memory perturbation code does not change the JavaScript’s original semantics

    The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

    Get PDF
    In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated

    Program Similarity Analysis for Malware Classification and its Pitfalls

    Get PDF
    Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process

    Integrated software fingerprinting via neural-network-based control flow obfuscation

    Get PDF
    • …
    corecore