3,583 research outputs found

    Code obfuscation and malware detection by abstract interpretation

    Get PDF
    Non disponibileAn obfuscating transformation aims at confusing a program in order to make it more difficult to understand while preserving its functionality. Software protection and malware detection are two major applications of code obfuscation. Software developers use code obfuscation in order to defend their programs against attacks to the intellectual property, usually called malicious host attacks. In fact, by making the programs more difficult to understand it is possible to obstruct malicious reverse engineering \u2013 a typical attack to the intellectual property of programs. On the other side, malware writers usually obfuscate their malicious code in order to avoid detection. In this setting, the ability of code obfuscation to foil most of the existing detection techniques, such as misuse detection algorithms, relies in their purely syntactic nature that makes malware detection sensitive to slight modifications of programs syntax. In the software protection scenario, researchers try to develop sophisticated obfuscating techniques that are able to resist as many attacks as possible. In the malware detection scenario, on the other hand, it is important to design advanced detection algorithms in order to detect as many variants of obfuscated malware as possible. It is clear how both malicious host and malicious code attacks represent harmful threats to the security of computer networks. In this dissertation, we are interested in both security issues described above. In particular, we describe a formal approach to code obfuscation and malware detection based on program semantics and abstract interpretation. This theoretical framework is useful in contrasting some well known drawbacks of software protection through code obfuscation, as well as for improving existing malware detection schemes. In fact, the lack of rigorous theoretical bases for code obfuscation prevents any possibility to formally study and certify their effectiveness in protecting proprietary programs. Moreover, in order to design malware detection schemes that are resilient to obfuscation we have to focus on program semantics rather than on program syntax. Our formal framework for code obfuscation relies on a semantics-based definition of code obfuscation that characterizes each program transformation T as a potential obfuscation in terms of the most concrete property preserved by T on program semantics. Deobfuscating techniques, and reverse engineering in general, usually begin with some sort of static program analysis, which can be specified as an abstraction of program semantics. In the software protection scenario, this observation naturally leads to model attackers as abstractions of program semantics. In fact, the abstraction modeling the attacker expresses the amount of information, namely the semantic properties, that the attacker is able to observe. It follows that, comparing the degree of abstraction of an attacker A with the one of the most concrete property preserved by an obfuscating transformation T, it is possible to understand whether obfuscation T defeats attack A. Following the same reasoning it is possible to compare the efficiency of different obfuscating transformations, as well as the ability of different attackers in contrasting a given obfuscation. We apply our semantics-based framework to a known control code obfuscation technique that aims at confusing the control flow of the original program by inserting opaque predicates. As argued above, an obfuscating transformation modifies a program while preserving an abstraction of its semantics. This means that different obfuscated versions of the same malware have to share (at least) the malicious intent, namely the maliciousness of their semantics, even if they may express it through different syntactic forms. The basic idea of our formal approach to malware detection is to use program semantics to model both malware and program behaviour, and semantic abstractions to hide the details changed by the obfuscation. Thus, given an obfuscation T, we are interested in defining an abstraction of program semantics that does not distinguish between the semantics of malware M and the semantics of its obfuscated version T(M). In particular, we provide this suitable abstraction for an interesting class of commonly used obfuscating transformations. It is clear that, given a malware detector D, it is always possible to define its semantic counterpart by analyzing how D works on program semantics. At this point, by translating both malware detectors and obfuscating transformations in the semantic world, we are able to certify which obfuscations a detector is able to handle. This means that our semanticsbased framework provides a formal setting where malware detectors designers can prove the efficiency of their algorithms

    CryptoKnight:generating and modelling compiled cryptographic primitives

    Get PDF
    Cryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentricities and obfuscated transmutation mechanisms, hence requiring smarter, more efficient detection strategies. The following manuscript presents a novel approach for the classification of cryptographic primitives in compiled binary executables using deep learning. The model blueprint, a Dynamic Convolutional Neural Network (DCNN), is fittingly configured to learn from variable-length control flow diagnostics output from a dynamic trace. To rival the size and variability of equivalent datasets, and to adequately train our model without risking adverse exposure, a methodology for the procedural generation of synthetic cryptographic binaries is defined, using core primitives from OpenSSL with multivariate obfuscation, to draw a vastly scalable distribution. The library, CryptoKnight, rendered an algorithmic pool of AES, RC4, Blowfish, MD5 and RSA to synthesise combinable variants which automatically fed into its core model. Converging at 96% accuracy, CryptoKnight was successfully able to classify the sample pool with minimal loss and correctly identified the algorithm in a real-world crypto-ransomware applicatio

    Chaotic Compilation for Encrypted Computing: Obfuscation but Not in Name

    Get PDF
    An `obfuscation' for encrypted computing is quantified exactly here, leading to an argument that security against polynomial-time attacks has been achieved for user data via the deliberately `chaotic' compilation required for security properties in that environment. Encrypted computing is the emerging science and technology of processors that take encrypted inputs to encrypted outputs via encrypted intermediate values (at nearly conventional speeds). The aim is to make user data in general-purpose computing secure against the operator and operating system as potential adversaries. A stumbling block has always been that memory addresses are data and good encryption means the encrypted value varies randomly, and that makes hitting any target in memory problematic without address decryption, yet decryption anywhere on the memory path would open up many easily exploitable vulnerabilities. This paper `solves (chaotic) compilation' for processors without address decryption, covering all of ANSI C while satisfying the required security properties and opening up the field for the standard software tool-chain and infrastructure. That produces the argument referred to above, which may also hold without encryption.Comment: 31 pages. Version update adds "Chaotic" in title and throughout paper, and recasts abstract and Intro and other sections of the text for better access by cryptologists. To the same end it introduces the polynomial time defense argument explicitly in the final section, having now set that denouement out in the abstract and intr
    corecore