119 research outputs found
Polygraph: Automatically generating signatures for polymorphic worms
It is widely believed that content-signature-based intrusion detection systems (IDSes) are easily evaded by polymorphic worms, which vary their payload on every infection attempt. In this paper, we present Polygraph, a signature generation system that successfully produces signatures that match polymorphic worms. Polygraph generates signatures that consist of multiple disjoint content sub-strings. In doing so, Polygraph leverages our insight that for a real-world exploit to function properly, multiple invariant substrings must often be present in all variants of a payload; these substrings typically correspond to protocol framing, return addresses, and in some cases, poorly obfuscated code. We contribute a definition of the polymorphic signature generation problem; propose classes of signature suited for matching polymorphic worm payloads; and present algorithms for automatic generation of signatures in these classes. Our evaluation of these algorithms on a range of polymorphic worms demonstrates that Polygraph produces signatures for polymorphic worms that exhibit low false negatives and false positives. © 2005 IEEE
Detecting and Modeling Polymorphic Shellcode
In this thesis, we address the problem of modeling and detecting polymorphic engines shellcode. By polymorphic engines, we mean programs having the ability to transform any piece of malware into many instances consisting of different code but having the same functionality as the original malware. Typically, polymorphic engines work by encrypting the target malware using various encryption techniques and providing a decryption module in order to execute the newly encrypted instance. Moreover, those engines have the ability to mutate their decryption routine making them unique from one instance to another and hard to detect. Our analysis focuses on polymorphic shellcode, which is shellcode that uses a polymorphic engine to mutate while keeping the original function of the code the same. We propose a new concept of signatures, shape signatures, which cope with the highly mutated nature of those engines. Those signatures try to identify the constant part as well as the mutated part of the deciphering routines. This combination is able to cope with the highly mutated nature of those engines in a much more efficient way compared to traditional signatures used in most intrusion detection systems. The second part of the thesis aims at modeling those polymorphic engines by showing that they exhibit commo
Detecting Malicious Software By Dynamicexecution
Traditional way to detect malicious software is based on signature matching. However, signature matching only detects known malicious software. In order to detect unknown malicious software, it is necessary to analyze the software for its impact on the system when the software is executed. In one approach, the software code can be statically analyzed for any malicious patterns. Another approach is to execute the program and determine the nature of the program dynamically. Since the execution of malicious code may have negative impact on the system, the code must be executed in a controlled environment. For that purpose, we have developed a sandbox to protect the system. Potential malicious behavior is intercepted by hooking Win32 system calls. Using the developed sandbox, we detect unknown virus using dynamic instruction sequences mining techniques. By collecting runtime instruction sequences in basic blocks, we extract instruction sequence patterns based on instruction associations. We build classification models with these patterns. By applying this classification model, we predict the nature of an unknown program. We compare our approach with several other approaches such as simple heuristics, NGram and static instruction sequences. We have also developed a method to identify a family of malicious software utilizing the system call trace. We construct a structural system call diagram from captured dynamic system call traces. We generate smart system call signature using profile hidden Markov model (PHMM) based on modularized system call block. Smart system call signature weakly identifies a family of malicious software
Malware Resistant Data Protection in Hyper-connected Networks: A survey
Data protection is the process of securing sensitive information from being
corrupted, compromised, or lost. A hyperconnected network, on the other hand,
is a computer networking trend in which communication occurs over a network.
However, what about malware. Malware is malicious software meant to penetrate
private data, threaten a computer system, or gain unauthorised network access
without the users consent. Due to the increasing applications of computers and
dependency on electronically saved private data, malware attacks on sensitive
information have become a dangerous issue for individuals and organizations
across the world. Hence, malware defense is critical for keeping our computer
systems and data protected. Many recent survey articles have focused on either
malware detection systems or single attacking strategies variously. To the best
of our knowledge, no survey paper demonstrates malware attack patterns and
defense strategies combinedly. Through this survey, this paper aims to address
this issue by merging diverse malicious attack patterns and machine learning
(ML) based detection models for modern and sophisticated malware. In doing so,
we focus on the taxonomy of malware attack patterns based on four fundamental
dimensions the primary goal of the attack, method of attack, targeted exposure
and execution process, and types of malware that perform each attack. Detailed
information on malware analysis approaches is also investigated. In addition,
existing malware detection techniques employing feature extraction and ML
algorithms are discussed extensively. Finally, it discusses research
difficulties and unsolved problems, including future research directions.Comment: 30 pages, 9 figures, 7 tables, no where submitted ye
Multimodal Approach for Malware Detection
Although malware detection is a very active area of research, few works were focused on using physical properties (e.g., power consumption) and multimodal features for malware detection. We designed an experimental testbed that allowed us to run samples of malware and non-malicious software applications and to collect power consumption, network traffic, and system logs data, and subsequently to extract dynamic behavioral-based features. We also extracted code-based static features of both malware and non-malicious software applications. These features were used for malware detection based on: feature level fusion using power consumption and network traffic data, feature level fusion using network traffic data and system logs, and multimodal feature level and decision level fusion.
The contributions when using feature level fusion of power consumption and network traffic data are: (1) We focused on detecting real malware using the extracted dynamic behavioral features (both power-based and network traffic-based) and supervised machine learning algorithms, which has not been done by any of the prior works. (2) We ran a large number of machine learning experiments, which allowed us to identify the best performing learner, DC voltage rails that led to the best malware detection performance, and the subset of features that are the best predictors for malware detection. (3) The comparison of malware detection performance was done using a comprehensive set of metrics that reflect different aspects of the quality of malware detection.
In the case of the feature level fusion using network traffic data and system logs, the contributions are: (1) Most of the previous works that have used network flows-based features have done classification of the network traffic, while our focus was on classifying the software running in a machine as malware and non-malicious software using the extracted dynamic behavioral features. (2) We experimented with different sizes of the training set (i.e., 90%, 75%, 50%, and 25% of the data) and found that smaller training sets produced very good classification results. This aspect of our work has a practical value because the manual labeling of the training set is a tedious and time consuming process.
In this dissertation we present a multimodal deep learning neural network that integrates different modalities (i.e., power consumption, system logs, network traffic, and code-based static data) using decision level fusion. We evaluated the performance of each modality individually, when using feature level fusion, and when using decision level fusion. The contributions of our multimodal approach are as follow: (1) Collecting data from different modalities allowed us to develop a multimodal approach to malware detection, which has not been widely explored by prior works. Even more, none of the previous works compared the performance of feature level fusion with decision level fusion, which is explored in this dissertation. (2) We proposed a multimodal decision level fusion malware detection approach using a deep neural network and compared its performance with the performance of feature level fusion approaches based on deep neural network and standard supervised machine learning algorithms (i.e., Random Forest, J48, JRip, PART, Naive Bayes, and SMO)
Generating content-based signatures for detecting bot-infected machines
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.Thesis (Master's) -- Bilkent University, 2008.Includes bibliographical references leaves 76-79.A botnet is a network of compromised machines that are remotely controlled and
commanded by an attacker, who is often called the botmaster. Such botnets are
often abused as platforms to launch distributed denial of service attacks, send
spam mails or perform identity theft. In recent years, the basic motivations
for malicious activity have shifted from script kiddie vandalism in the hacker
community, to more organized attacks and intrusions for financial gain. This shift
explains the reason for the rise of botnets that have capabilities to perform more
sophisticated malicious activities. Recently, researchers have tried to develop
botnet detection mechanisms. The botnet detection mechanisms proposed to date
have serious limitations, since they either can handle only certain types of botnets
or focus on only specific botnet attributes, such as the spreading mechanism, the
attack mechanism, etc., in order to constitute their detection models.
We present a system that monitors network traffic to identify bot-infected
hosts. Our goal is to develop a more general detection model that identifies
single infected machines without relying on the bot propagation vector. To this
end, we leverage the insight that all of the bots get a command and perform an
action as a response, since the command and response behavior is the unique
characteristic that distinguishes the bots from other malware. Thus, we examine
the network traffic generated by bots to locate command and response behaviors.
Afterwards, we generate signatures from the similar commands that are followed
by similar bot responses without any explicit knowledge about the command
and control protocol. The signatures are deployed to an IDS that monitors the
network traffic of a university. Finally, the experiments showed that our system
is capable of detecting bot-infected machines with a low false positive rate.Bilge, LeylaM.S
Dynamic Application Level Security Sensors
The battle for cyber supremacy is a cat and mouse game: evolving threats from internal and external sources make it difficult to protect critical systems. With the diverse and high risk nature of these threats, there is a need for robust techniques that can quickly adapt and address this evolution. Existing tools such as Splunk, Snort, and Bro help IT administrators defend their networks by actively parsing through network traffic or system log data. These tools have been thoroughly developed and have proven to be a formidable defense against many cyberattacks. However, they are vulnerable to zero-day attacks, slow attacks, and attacks that originate from within. Should an attacker or some form of malware make it through these barriers and onto a system, the next layer of defense lies on the host. Host level defenses include system integrity verifiers, virus scanners, and event log parsers. Many of these tools work by seeking specific attack signatures or looking for anomalous events. The defenses at the network and host level are similar in nature. First, sensors collect data from the security domain. Second, the data is processed, and third, a response is crafted based on the processing. The application level security domain lacks this three step process. Application level defenses focus on secure coding practices and vulnerability patching, which is ineffective. The work presented in this thesis uses a technique that is commonly employed by malware, dynamic-link library (DLL) injection, to develop dynamic application level security sensors that can extract fine-grain data at runtime. This data can then be processed to provide stronger application level defense by shrinking the vulnerability window. Chapters 5 and 6 give proof of concept sensors and describe the process of developing the sensors in detail
- …