3,583 research outputs found
Code obfuscation and malware detection by abstract interpretation
Non disponibileAn obfuscating transformation aims at confusing a program in order to make it
more difficult to understand while preserving its functionality. Software protection
and malware detection are two major applications of code obfuscation. Software
developers use code obfuscation in order to defend their programs against
attacks to the intellectual property, usually called malicious host attacks. In fact,
by making the programs more difficult to understand it is possible to obstruct
malicious reverse engineering \u2013 a typical attack to the intellectual property of
programs. On the other side, malware writers usually obfuscate their malicious
code in order to avoid detection. In this setting, the ability of code obfuscation
to foil most of the existing detection techniques, such as misuse detection algorithms,
relies in their purely syntactic nature that makes malware detection
sensitive to slight modifications of programs syntax. In the software protection
scenario, researchers try to develop sophisticated obfuscating techniques that are
able to resist as many attacks as possible. In the malware detection scenario,
on the other hand, it is important to design advanced detection algorithms in
order to detect as many variants of obfuscated malware as possible. It is clear
how both malicious host and malicious code attacks represent harmful threats
to the security of computer networks.
In this dissertation, we are interested in both security issues described above.
In particular, we describe a formal approach to code obfuscation and malware
detection based on program semantics and abstract interpretation. This theoretical
framework is useful in contrasting some well known drawbacks of software
protection through code obfuscation, as well as for improving existing malware
detection schemes. In fact, the lack of rigorous theoretical bases for code obfuscation
prevents any possibility to formally study and certify their effectiveness
in protecting proprietary programs. Moreover, in order to design malware detection
schemes that are resilient to obfuscation we have to focus on program
semantics rather than on program syntax.
Our formal framework for code obfuscation relies on a semantics-based definition
of code obfuscation that characterizes each program transformation T
as a potential obfuscation in terms of the most concrete property preserved by
T on program semantics. Deobfuscating techniques, and reverse engineering in
general, usually begin with some sort of static program analysis, which can be
specified as an abstraction of program semantics. In the software protection
scenario, this observation naturally leads to model attackers as abstractions of
program semantics. In fact, the abstraction modeling the attacker expresses the
amount of information, namely the semantic properties, that the attacker is able
to observe. It follows that, comparing the degree of abstraction of an attacker A
with the one of the most concrete property preserved by an obfuscating transformation
T, it is possible to understand whether obfuscation T defeats attack
A. Following the same reasoning it is possible to compare the efficiency of different
obfuscating transformations, as well as the ability of different attackers
in contrasting a given obfuscation. We apply our semantics-based framework to
a known control code obfuscation technique that aims at confusing the control
flow of the original program by inserting opaque predicates.
As argued above, an obfuscating transformation modifies a program while
preserving an abstraction of its semantics. This means that different obfuscated
versions of the same malware have to share (at least) the malicious intent,
namely the maliciousness of their semantics, even if they may express it
through different syntactic forms. The basic idea of our formal approach to
malware detection is to use program semantics to model both malware and program
behaviour, and semantic abstractions to hide the details changed by the
obfuscation. Thus, given an obfuscation T, we are interested in defining an abstraction
of program semantics that does not distinguish between the semantics
of malware M and the semantics of its obfuscated version T(M). In particular,
we provide this suitable abstraction for an interesting class of commonly used
obfuscating transformations. It is clear that, given a malware detector D, it is
always possible to define its semantic counterpart by analyzing how D works
on program semantics. At this point, by translating both malware detectors
and obfuscating transformations in the semantic world, we are able to certify
which obfuscations a detector is able to handle. This means that our semanticsbased
framework provides a formal setting where malware detectors designers
can prove the efficiency of their algorithms
CryptoKnight:generating and modelling compiled cryptographic primitives
Cryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentricities and obfuscated transmutation mechanisms, hence requiring smarter, more efficient detection strategies. The following manuscript presents a novel approach for the classification of cryptographic primitives in compiled binary executables using deep learning. The model blueprint, a Dynamic Convolutional Neural Network (DCNN), is fittingly configured to learn from variable-length control flow diagnostics output from a dynamic trace. To rival the size and variability of equivalent datasets, and to adequately train our model without risking adverse exposure, a methodology for the procedural generation of synthetic cryptographic binaries is defined, using core primitives from OpenSSL with multivariate obfuscation, to draw a vastly scalable distribution. The library, CryptoKnight, rendered an algorithmic pool of AES, RC4, Blowfish, MD5 and RSA to synthesise combinable variants which automatically fed into its core model. Converging at 96% accuracy, CryptoKnight was successfully able to classify the sample pool with minimal loss and correctly identified the algorithm in a real-world crypto-ransomware applicatio
Chaotic Compilation for Encrypted Computing: Obfuscation but Not in Name
An `obfuscation' for encrypted computing is quantified exactly here, leading
to an argument that security against polynomial-time attacks has been achieved
for user data via the deliberately `chaotic' compilation required for security
properties in that environment. Encrypted computing is the emerging science and
technology of processors that take encrypted inputs to encrypted outputs via
encrypted intermediate values (at nearly conventional speeds). The aim is to
make user data in general-purpose computing secure against the operator and
operating system as potential adversaries. A stumbling block has always been
that memory addresses are data and good encryption means the encrypted value
varies randomly, and that makes hitting any target in memory problematic
without address decryption, yet decryption anywhere on the memory path would
open up many easily exploitable vulnerabilities. This paper `solves (chaotic)
compilation' for processors without address decryption, covering all of ANSI C
while satisfying the required security properties and opening up the field for
the standard software tool-chain and infrastructure. That produces the argument
referred to above, which may also hold without encryption.Comment: 31 pages. Version update adds "Chaotic" in title and throughout
paper, and recasts abstract and Intro and other sections of the text for
better access by cryptologists. To the same end it introduces the polynomial
time defense argument explicitly in the final section, having now set that
denouement out in the abstract and intr
- …