Search CORE

466 research outputs found

Mining Malware Specifications through Static Reachability Analysis

Author: Macedo Hugo,
Touili Tayssir
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceAbstract. The number of malicious software (malware) is growing out of control. Syntactic signature based detection cannot cope with such growth and manual construction of malware signature databases needs to be replaced by computer learning based approaches. Currently, a single modern signature capturing the semantics of a malicious behavior can be used to replace an arbitrarily large number of old-fashioned syntactical signatures. However teaching computers to learn such behaviors is a challenge. Existing work relies on dynamic analysis to extract malicious behaviors, but such technique does not guarantee the coverage of all behaviors. To sidestep this limitation we show how to learn malware signatures using static reachability analysis. The idea is to model binary programs using pushdown systems (that can be used to model the stack operations occurring during the binary code execution), use reachability analysis to extract behaviors in the form of trees, and use subtrees that are common among the trees extracted from a training set of malware files as signatures. To detect malware we propose to use a tree automaton to compactly store malicious behavior trees and check if any of the subtrees extracted from the file under analysis is malicious. Experimental data shows that our approach can be used to learn signatures from a training set of malware files and use them to detect a test set of malware that is 5 times the size of the training set

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Static Behavioral Malware Detection over LLVM IR

Author: Surovič Marek
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2016
Field of study

Tato práce se zabývá metodami pro behaviorální detekci malware, které využívají techniky formální analýzy a verifikace. Základem je odvozování stromových automatů z grafů závislostí systémových volání, které jsou získány pomocí statické analýzy LLVM IR. V rámci práce je implementován prototyp detektoru, který využívá překladačovou infrastrukturu LLVM. Pro experimentální ověření detektoru je použit překladač jazyka C/C++, který je schopen generovat mutace malware za pomoci obfuskujících transformací. Výsledky předběžných experimentů a případná budoucí rozšíření detektoru jsou diskutovány v závěru práce.In this thesis we study methods for behavioral malware detection, which use techniques of formal verification. In particular we build on the works, which use inference of tree automata from syscall dependency graphs, obtained by static analysis of LLVM IR. We design and implement a prototype detector using the LLVM compiler framework. For experiments with the detector we use an obfuscating compiler capable of generating mutations of malware from C/C++ source code. We discuss preliminary experiments which show the capabilities of the detector and possible future extensions to the detector.

Digital library of Brno University of Technology

National Repository of Grey Literature

Analyzing program dependences for malware detection.

Author: DALLA PREDA Mila
Giacobazzi Roberto
Mastroeni Isabella
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Metamorphic malware continuously modify their code, while preserving their functionality, in order to foil misuse detection. The key for defeating metamorphism relies in a semantic characterization of the embedding of the malware into the target program. Indeed, a behavioral model of program infection that does not relay on syntactic program features should be able to defeat metamorphism. Moreover, a general model of infection should be able to express dependences and interactions between the malicious codeand the target program. ANI is a general theory for the analysis of dependences of data in a program. We propose an high order theory for ANI, later called HOANI, that allows to study program dependencies. Our idea is then to formalize and study the malware detection problem in terms of HOANI

Crossref

Catalogo dei prodotti della ricerca

Behavioral Clustering of Non-Stationary IP Flow Record Data

Author: Hammerschmidt Christian
Marchal Samuel
State Radu
Verwer Sicco
Publication venue
Publication date: 01/10/2016
Field of study

Crossref

Open Repository and Bibliography - Luxembourg

Unveiling metamorphism by abstract interpretation of code properties

Author: DALLA PREDA Mila
Giacobazzi Roberto
Saumya K. Debray
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Metamorphic code includes self-modifying semantics-preserving transformations to exploit code diversification. The impact of metamorphism is growing in security and code protection technologies, both for preventing malicious host attacks, e.g., in software diversification for IP and integrity protection, and in malicious software attacks, e.g., in metamorphic malware self-modifying their own code in order to foil detection systems based on signature matching. In this paper we consider the problem of automatically extracting metamorphic signatures from metamorphic code. We introduce a semantics for self-modifying code, later called phase semantics, and prove its correctness by showing that it is an abstract interpretation of the standard trace semantics. Phase semantics precisely models the metamorphic code behavior by providing a set of traces of programs which correspond to the possible evolutions of the metamorphic code during execution. We show that metamorphic signatures can be automatically extracted by abstract interpretation of the phase semantics. In particular, we introduce the notion of regular metamorphism, where the invariants of the phase semantics can be modeled as finite state automata representing the code structure of all possible metamorphic change of a metamorphic code, and we provide a static signature extraction algorithm for metamorphic code where metamorphic signatures are approximated in regular metamorphism

Crossref

Catalogo dei prodotti della ricerca

Prospex:ProtocolSpecificationExtraction

Author: Christopher Kruegel
Engin Kirda
Gilbert Wondracek
Paolo Milani Comparetti
Publication venue
Publication date
Field of study

Protocol reverse engineering is the process of extracting application-level specifications for network protocols. Such specificationsare very useful in a numberof security-related contexts, forexample, to perform deep packet inspectionand black-box fuzzing, or to quickly understand custom botnet command and control (C&C) channels. Since manual reverse engineering is a time-consuming and tedious process, a number of systems have been proposed that aim to automate this task. These systems either analyze network traffic directly or monitor the execution of the application that receivestheprotocolmessages.While previoussystemsshow thatprecise message formatscanbe extractedautomatically, they do not provide a protocol specification. The reason is that they do not reverse engineerthe protocol state machine. In this paper, we focus on closing this gap by presenting a system that is capable of automatically inferring state machines. This greatly enhances the results of automatic protocol reverse engineering, while further reducing the need for human interaction. We extend previous work that focuses on behavior-based message format extraction, and introduce techniques for identifying and clustering different types of messages not only based on their structure, but also accordingto the impact of each message on server behavior. Moreover, we present an algorithm for extracting the state machine. We have applied our techniques to a number of real-world protocols, including the command and control protocol used by a malicious bot. Our results demonstrate that we are able to extract format specifications for different types of messages and meaningful protocol state machines. We use these protocol specifications to automatically generate input for a stateful fuzzer, allowing us to discover security vulnerabilities in real-world applications. 1

CiteSeerX

Recommended from our members

Symbolic Model Learning: New Algorithms and Applications

Author: Argyros Georgios
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

In this thesis, we study algorithms which can be used to extract, or learn, formal mathematical models from software systems and then using these models to test whether the given software systems satisfy certain security properties such as robustness against code injection attacks. Specifically, we focus on studying learning algorithms for automata and transducers and the symbolic extensions of these models, namely symbolic finite automata (SFAs). In a high level, this thesis contributes the following results: 1. In the first part of the thesis, we present a unified treatment of many common variations of the seminal L* algorithm for learning deterministic finite automata (DFAs) as a congruence learning algorithm for the underlying Nerode congruence which forms the basis of automata theory. Under this formulation the basic data structures used by different variations are unified as different ways to implement the Nerode congruence using queries. 2. Next, building on the new formulation of L*-style algorithms we proceed to develop new algorithms for learning transducer models. Firstly, we present the first algorithm for learning deterministic partial transducers. Furthermore, we extend my algorithm into non-deterministic models by introducing a novel, generalized congruence relation over string transformations which is able to capture a subclass of string transformations with regular lookahead. We demonstrate that this class is able to capture many practical string transformation from the domain of string sanitizers in Web applications. 3. Classical learning algorithms for automata and transducers operate over finite alphabets and have a query complexity that scales linearly with the size of the alphabet. However, in practice, this dependence on the alphabet size hinders the performance of the algorithms. To address this issue, we develop the MAT* algorithm for learning symbolic finite state automata (SFAs) which operate over infinite alphabets. In practice, the MAT* learning algorithm allow us to plug custom transition learning algorithms which will efficiently infer the predicates in the transitions of the SFA without querying the whole alphabet set. 4. Finally, we use our learning algorithm toolbox as the basis for the development of a set of black-box testing algorithms. More specifically, we present Grammar Oriented Filter Auditing (GOFA), a novel technique which allows one to utilize my learning algorithms to evaluate the robustness of a string sanitizer or filter against a set of attack strings given as a context-free grammar. Furthermore, because such grammars are many times unavailable, we developed sfadiff a differential testing technique based on symbolic automata learning which can be used in order to perform differential testing of two different parser implementations using SFA learning algorithms and we demonstrate how our algorithm can be used to develop program fingerprints. We evaluate our algorithms against state-of-the-art Web Application Firewalls and discover over 15 previously unknown vulnerabilities which result in evading the firewalls and performing code injection attacks in the backend Web application. Finally, we show how our learning algorithms can uncover vulnerabilities which are missed by other black-box methods such as fuzzing and grammar-based testing

Columbia University Academic Commons

Fuzzy Automaton as a Detection Mechanism for the Multi-Step Attack

Author: Al-Kasassbeh Mouhammd
Almseidin Mohammad
Kovacs Szilveszter
Piller Imre
Publication venue: 'Insight Society'
Publication date: 16/03/2019
Field of study

The integration of a fuzzy system and automaton theory can form the concept of fuzzy automaton. This integration allows a discretely defined state-machine to act on continuous universes and handle uncertainty in applications like Intrusion Detection Systems (IDS). The typical IDS detection mechanisms are targeted to detect and prevent single-stage attacks. These types of attacks can be detected using either a common convincing threshold or by pre-defined rules. However, attack techniques have changed in recent years. Currently, the largest proportion of attacks performed, are multi-step attacks. The goal of this paper is to introduce a novel detection mechanism for multi-step attacks built upon Fuzzy Rule Interpolation (FRI) based fuzzy automaton. In that respect, the FRI method instruments the fuzzy automaton to be able to act on a not fully defined state transition rule-base, by offering interpolated conclusion even for situations which are not explicitly defined. In the suggested model, the intrusion definition state transition rule-base is defined using an open source fuzzy declarative language. On the multi-step attack benchmark dataset introduced in this paper, the proposed detection mechanism was able to achieve 97.836% detection rate. Furthermore, in the studied examples, the suggested method was able not only to detect but also early detect the multi-step attack in stages, where the planned attack is not fully elaborated and hence less harmful. According to these results, the IDS built upon the FRI based fuzzy automaton could be a useful device for detecting multi-step attacks, even in cases when the intrusion state transition rule-based is incomplete. The early detection of multi-step attacks also allows the administrator to take the necessary actions in time, to mitigate the potential threats

International Journal on Advanced Science, Engineering and Information Technology

Detecting malicious activities with user-agent-based profiles

Author: Lee Sung-Ju
Mekky Hesham
Mellia Marco
Tongaonkar Alok
Torres Ruben
Zhang Yang
Zhang Zhi-Li
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

Hypertext transfer protocol (HTTP) has become the main protocol to carry out malicious activities. Attackers typically use HTTP for communication with command-and-control servers, click fraud, phishing and other malicious activities, as they can easily hide among the large amount of benign HTTP traffic. The user-agent (UA) field in the HTTP header carries information on the application, operating system (OS), device, and so on, and adversaries fake UA strings as a way to evade detection. Motivated by this, we propose a novel grammar-guided UA string classification method in HTTP flows. We leverage the fact that a number of ‘standard’ applications, such as web browsers and iOS mobile apps, have well-defined syntaxes that can be specified using context-free grammars, and we extract OS, device and other relevant information from them. We develop association heuristics to classify UA strings that are generated by ‘non-standard’ applications that do not contain OS or device information. We provide a proof-of-concept system that demonstrates how our approach can be used to identify malicious applications that generate fake UA strings to engage in fraudulent activities

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A hybrid intrusion detection system

Author: Wang Yanxin
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2004
Field of study

Anomaly intrusion detection normally has high false alarm rates, and a high volume of false alarms will prevent system administrators identifying the real attacks. Machine learning methods provide an effective way to decrease the false alarm rate and improve the detection rate of anomaly intrusion detection. In this research, we propose a novel approach using kernel methods and Support Vector Machine (SVM) for improving anomaly intrusion detectors\u27 accuracy. Two kernels, STIDE kernel and Markov Chain kernel, are developed specially for intrusion detection applications. The experiments show the STIDE and Markov Chain kernel based two class SVM anomaly detectors have better accuracy rate than the original STIDE and Markov Chain anomaly detectors.;Generally, anomaly intrusion detection approaches build normal profiles from labeled training data. However, labeled training data for intrusion detection is expensive and not easy to obtain. We propose an anomaly detection approach, using STIDE kernel and Markov Chain kernel based one class SVM, that does not need labeled training data. To further increase the detection rate and lower the false alarm rate, an approach of integrating specification based intrusion detection with anomaly intrusion detection is also proposed.;This research also establish a platform which generates automatically both misuse and anomaly intrusion detection software agents. In our method, a SIFT representing an intrusion is automatically converted to a Colored Petri Net (CPNs) representing an intrusion detection template, subsequently, the CPN is compiled into code for misuse intrusion detection software agents using a compiler and dynamically loaded and launched for misuse intrusion detection. On the other hand, a model representing a normal profile is automatically generated from training data, subsequently, an anomaly intrusion detection agent which carries this model is generated and launched for anomaly intrusion detection. By engaging both misuse and anomaly intrusion detection agents, our system can detect known attacks as well as novel unknown attacks

Digital Repository @ Iowa State University (ISU)