2,652 research outputs found
Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers
In this paper, we present a black-box attack against API call based machine
learning malware classifiers, focusing on generating adversarial sequences
combining API calls and static features (e.g., printable strings) that will be
misclassified by the classifier without affecting the malware functionality. We
show that this attack is effective against many classifiers due to the
transferability principle between RNN variants, feed forward DNNs, and
traditional machine learning classifiers such as SVM. We also implement GADGET,
a software framework to convert any malware binary to a binary undetected by
malware classifiers, using the proposed attack, without access to the malware
source code.Comment: Accepted as a conference paper at RAID 201
Automatic Malware Description via Attribute Tagging and Similarity Embedding
With the rapid proliferation and increased sophistication of malicious
software (malware), detection methods no longer rely only on manually generated
signatures but have also incorporated more general approaches like machine
learning detection. Although powerful for conviction of malicious artifacts,
these methods do not produce any further information about the type of threat
that has been detected neither allows for identifying relationships between
malware samples. In this work, we address the information gap between machine
learning and signature-based detection methods by learning a representation
space for malware samples in which files with similar malicious behaviors
appear close to each other. We do so by introducing a deep learning based
tagging model trained to generate human-interpretable semantic descriptions of
malicious software, which, at the same time provides potentially more useful
and flexible information than malware family names.
We show that the malware descriptions generated with the proposed approach
correctly identify more than 95% of eleven possible tag descriptions for a
given sample, at a deployable false positive rate of 1% per tag. Furthermore,
we use the learned representation space to introduce a similarity index between
malware files, and empirically demonstrate using dynamic traces from files'
execution, that is not only more effective at identifying samples from the same
families, but also 32 times smaller than those based on raw feature vectors
Query-Efficient Black-Box Attack Against Sequence-Based Malware Classifiers
In this paper, we present a generic, query-efficient black-box attack against
API call-based machine learning malware classifiers. We generate adversarial
examples by modifying the malware's API call sequences and non-sequential
features (printable strings), and these adversarial examples will be
misclassified by the target malware classifier without affecting the malware's
functionality. In contrast to previous studies, our attack minimizes the number
of malware classifier queries required. In addition, in our attack, the
attacker must only know the class predicted by the malware classifier; attacker
knowledge of the malware classifier's confidence score is optional. We evaluate
the attack effectiveness when attacks are performed against a variety of
malware classifier architectures, including recurrent neural network (RNN)
variants, deep neural networks, support vector machines, and gradient boosted
decision trees. Our attack success rate is around 98% when the classifier's
confidence score is known and 64% when just the classifier's predicted class is
known. We implement four state-of-the-art query-efficient attacks and show that
our attack requires fewer queries and less knowledge about the attacked model's
architecture than other existing query-efficient attacks, making it practical
for attacking cloud-based malware classifiers at a minimal cost.Comment: Accepted as a conference paper at ACSAC 202
DeepOrigin: End-to-End Deep Learning for Detection of New Malware Families
In this paper, we present a novel method of differentiating known from
previously unseen malware families. We utilize transfer learning by learning
compact file representations that are used for a new classification task
between previously seen malware families and novel ones. The learned file
representations are composed of static and dynamic features of malware and are
invariant to small modifications that do not change their malicious
functionality. Using an extensive dataset that consists of thousands of
variants of malicious files, we were able to achieve 97.7% accuracy when
classifying between seen and unseen malware families. Our method provides an
important focalizing tool for cybersecurity researchers and greatly improves
the overall ability to adapt to the fast-moving pace of the current threat
landscape
Automated Poisoning Attacks and Defenses in Malware Detection Systems: An Adversarial Machine Learning Approach
The evolution of mobile malware poses a serious threat to smartphone
security. Today, sophisticated attackers can adapt by maximally sabotaging
machine-learning classifiers via polluting training data, rendering most recent
machine learning-based malware detection tools (such as Drebin, DroidAPIMiner,
and MaMaDroid) ineffective. In this paper, we explore the feasibility of
constructing crafted malware samples; examine how machine-learning classifiers
can be misled under three different threat models; then conclude that injecting
carefully crafted data into training data can significantly reduce detection
accuracy. To tackle the problem, we propose KuafuDet, a two-phase learning
enhancing approach that learns mobile malware by adversarial detection.
KuafuDet includes an offline training phase that selects and extracts features
from the training set, and an online detection phase that utilizes the
classifier trained by the first phase. To further address the adversarial
environment, these two phases are intertwined through a self-adaptive learning
scheme, wherein an automated camouflage detector is introduced to filter the
suspicious false negatives and feed them back into the training phase. We
finally show that KuafuDet can significantly reduce false negatives and boost
the detection accuracy by at least 15%. Experiments on more than 250,000 mobile
applications demonstrate that KuafuDet is scalable and can be highly effective
as a standalone system
HADES-IoT: A Practical Host-Based Anomaly Detection System for IoT Devices (Extended Version)
Internet of Things (IoT) devices have become ubiquitous and are spread across
many application domains including the industry, transportation, healthcare,
and households. However, the proliferation of the IoT devices has raised the
concerns about their security, especially when observing that many
manufacturers focus only on the core functionality of their products due to
short time to market and low-cost pressures, while neglecting security aspects.
Moreover, it does not exist any established or standardized method for
measuring and ensuring the security of IoT devices. Consequently,
vulnerabilities are left untreated, allowing attackers to exploit IoT devices
for various purposes, such as compromising privacy, recruiting devices into a
botnet, or misusing devices to perform cryptocurrency mining.
In this paper, we present a practical Host-based Anomaly DEtection System for
IoT (HADES-IoT) that represents the last line of defense. HADES-IoT has
proactive detection capabilities, provides tamper-proof resistance, and it can
be deployed on a wide range of Linux-based IoT devices. The main advantage of
HADES-IoT is its low performance overhead, which makes it suitable for the IoT
domain, where state-of-the-art approaches cannot be applied due to their
high-performance demands. We deployed HADES-IoT on seven IoT devices to
evaluate its effectiveness and performance overhead. Our experiments show that
HADES-IoT achieved 100% effectiveness in the detection of current IoT malware
such as VPNFilter and IoTReaper; while on average, requiring only 5.5% of
available memory and causing only a low CPU load
eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys
For years security machine learning research has promised to obviate the need
for signature based detection by automatically learning to detect indicators of
attack. Unfortunately, this vision hasn't come to fruition: in fact, developing
and maintaining today's security machine learning systems can require
engineering resources that are comparable to that of signature-based detection
systems, due in part to the need to develop and continuously tune the
"features" these machine learning systems look at as attacks evolve. Deep
learning, a subfield of machine learning, promises to change this by operating
on raw input signals and automating the process of feature design and
extraction. In this paper we propose the eXpose neural network, which uses a
deep learning approach we have developed to take generic, raw short character
strings as input (a common case for security inputs, which include artifacts
like potentially malicious URLs, file paths, named pipes, named mutexes, and
registry keys), and learns to simultaneously extract features and classify
using character-level embeddings and convolutional neural network. In addition
to completely automating the feature design and extraction process, eXpose
outperforms manual feature extraction based baselines on all of the intrusion
detection problems we tested it on, yielding a 5%-10% detection rate gain at
0.1% false positive rate compared to these baselines
Malware Lineage in the Wild
Malware lineage studies the evolutionary relationships among malware and has
important applications for malware analysis. A persistent limitation of prior
malware lineage approaches is to consider every input sample a separate malware
version. This is problematic since a majority of malware are packed and the
packing process produces many polymorphic variants (i.e., executables with
different file hash) of the same malware version. Thus, many samples correspond
to the same malware version and it is challenging to identify distinct malware
versions from polymorphic variants. This problem does not manifest in prior
malware lineage approaches because they work on synthetic malware, malware that
are not packed, or packed malware for which unpackers are available. In this
work, we propose a novel malware lineage approach that works on malware samples
collected in the wild. Given a set of malware executables from the same family,
for which no source code is available and which may be packed, our approach
produces a lineage graph where nodes are versions of the family and edges
describe the relationships between versions. To enable our malware lineage
approach, we propose the first technique to identify the versions of a malware
family and a scalable code indexing technique for determining shared functions
between any pair of input samples. We have evaluated the accuracy of our
approach on 13 open-source programs and have applied it to produce lineage
graphs for 10 popular malware families. Our malware lineage graphs achieve on
average a 26 times reduction from number of input samples to number of
versions
Towards Generic Deobfuscation of Windows API Calls
A common way to get insight into a malicious program's functionality is to
look at which API functions it calls. To complicate the reverse engineering of
their programs, malware authors deploy API obfuscation techniques, hiding them
from analysts' eyes and anti-malware scanners. This problem can be partially
addressed by using dynamic analysis; that is, by executing a malware sample in
a controlled environment and logging the API calls. However, malware that is
aware of virtual machines and sandboxes might terminate without showing any
signs of malicious behavior. In this paper, we introduce a static analysis
technique allowing generic deobfuscation of Windows API calls. The technique
utilizes symbolic execution and hidden Markov models to predict API names from
the arguments passed to the API functions. Our best prediction model can
correctly identify API names with 87.60% accuracy.Comment: To be published in the 2018 Network and Distributed Systems Security
(NDSS) Symposium via its 2018 Workshop on Binary Analysis Research (BAR
Hijacking .NET to Defend PowerShell
With the rise of attacks using PowerShell in the recent months, there has not
been a comprehensive solution for monitoring or prevention. Microsoft recently
released the AMSI solution for PowerShell v5, however this can also be
bypassed. This paper focuses on repurposing various stealthy runtime .NET
hijacking techniques implemented for PowerShell attacks for defensive
monitoring of PowerShell. It begins with a brief introduction to .NET and
PowerShell, followed by a deeper explanation of various attacker techniques,
which is explained from the perspective of the defender, including assembly
modification, class and method injection, compiler profiling, and C based
function hooking. Of the four attacker techniques that are repurposed for
defensive real-time monitoring of PowerShell execution, intermediate language
binary modification, JIT hooking, and machine code manipulation provide the
best results for stealthy run-time interfaces for PowerShell scripting
analysis.Comment: 13 pages, 41 figures, CanSecWest 2017, BSidesSF 2017, Powershell,
.NE
- …