2,652 research outputs found

    Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers

    Full text link
    In this paper, we present a black-box attack against API call based machine learning malware classifiers, focusing on generating adversarial sequences combining API calls and static features (e.g., printable strings) that will be misclassified by the classifier without affecting the malware functionality. We show that this attack is effective against many classifiers due to the transferability principle between RNN variants, feed forward DNNs, and traditional machine learning classifiers such as SVM. We also implement GADGET, a software framework to convert any malware binary to a binary undetected by malware classifiers, using the proposed attack, without access to the malware source code.Comment: Accepted as a conference paper at RAID 201

    Automatic Malware Description via Attribute Tagging and Similarity Embedding

    Full text link
    With the rapid proliferation and increased sophistication of malicious software (malware), detection methods no longer rely only on manually generated signatures but have also incorporated more general approaches like machine learning detection. Although powerful for conviction of malicious artifacts, these methods do not produce any further information about the type of threat that has been detected neither allows for identifying relationships between malware samples. In this work, we address the information gap between machine learning and signature-based detection methods by learning a representation space for malware samples in which files with similar malicious behaviors appear close to each other. We do so by introducing a deep learning based tagging model trained to generate human-interpretable semantic descriptions of malicious software, which, at the same time provides potentially more useful and flexible information than malware family names. We show that the malware descriptions generated with the proposed approach correctly identify more than 95% of eleven possible tag descriptions for a given sample, at a deployable false positive rate of 1% per tag. Furthermore, we use the learned representation space to introduce a similarity index between malware files, and empirically demonstrate using dynamic traces from files' execution, that is not only more effective at identifying samples from the same families, but also 32 times smaller than those based on raw feature vectors

    Query-Efficient Black-Box Attack Against Sequence-Based Malware Classifiers

    Full text link
    In this paper, we present a generic, query-efficient black-box attack against API call-based machine learning malware classifiers. We generate adversarial examples by modifying the malware's API call sequences and non-sequential features (printable strings), and these adversarial examples will be misclassified by the target malware classifier without affecting the malware's functionality. In contrast to previous studies, our attack minimizes the number of malware classifier queries required. In addition, in our attack, the attacker must only know the class predicted by the malware classifier; attacker knowledge of the malware classifier's confidence score is optional. We evaluate the attack effectiveness when attacks are performed against a variety of malware classifier architectures, including recurrent neural network (RNN) variants, deep neural networks, support vector machines, and gradient boosted decision trees. Our attack success rate is around 98% when the classifier's confidence score is known and 64% when just the classifier's predicted class is known. We implement four state-of-the-art query-efficient attacks and show that our attack requires fewer queries and less knowledge about the attacked model's architecture than other existing query-efficient attacks, making it practical for attacking cloud-based malware classifiers at a minimal cost.Comment: Accepted as a conference paper at ACSAC 202

    DeepOrigin: End-to-End Deep Learning for Detection of New Malware Families

    Full text link
    In this paper, we present a novel method of differentiating known from previously unseen malware families. We utilize transfer learning by learning compact file representations that are used for a new classification task between previously seen malware families and novel ones. The learned file representations are composed of static and dynamic features of malware and are invariant to small modifications that do not change their malicious functionality. Using an extensive dataset that consists of thousands of variants of malicious files, we were able to achieve 97.7% accuracy when classifying between seen and unseen malware families. Our method provides an important focalizing tool for cybersecurity researchers and greatly improves the overall ability to adapt to the fast-moving pace of the current threat landscape

    Automated Poisoning Attacks and Defenses in Malware Detection Systems: An Adversarial Machine Learning Approach

    Full text link
    The evolution of mobile malware poses a serious threat to smartphone security. Today, sophisticated attackers can adapt by maximally sabotaging machine-learning classifiers via polluting training data, rendering most recent machine learning-based malware detection tools (such as Drebin, DroidAPIMiner, and MaMaDroid) ineffective. In this paper, we explore the feasibility of constructing crafted malware samples; examine how machine-learning classifiers can be misled under three different threat models; then conclude that injecting carefully crafted data into training data can significantly reduce detection accuracy. To tackle the problem, we propose KuafuDet, a two-phase learning enhancing approach that learns mobile malware by adversarial detection. KuafuDet includes an offline training phase that selects and extracts features from the training set, and an online detection phase that utilizes the classifier trained by the first phase. To further address the adversarial environment, these two phases are intertwined through a self-adaptive learning scheme, wherein an automated camouflage detector is introduced to filter the suspicious false negatives and feed them back into the training phase. We finally show that KuafuDet can significantly reduce false negatives and boost the detection accuracy by at least 15%. Experiments on more than 250,000 mobile applications demonstrate that KuafuDet is scalable and can be highly effective as a standalone system

    HADES-IoT: A Practical Host-Based Anomaly Detection System for IoT Devices (Extended Version)

    Full text link
    Internet of Things (IoT) devices have become ubiquitous and are spread across many application domains including the industry, transportation, healthcare, and households. However, the proliferation of the IoT devices has raised the concerns about their security, especially when observing that many manufacturers focus only on the core functionality of their products due to short time to market and low-cost pressures, while neglecting security aspects. Moreover, it does not exist any established or standardized method for measuring and ensuring the security of IoT devices. Consequently, vulnerabilities are left untreated, allowing attackers to exploit IoT devices for various purposes, such as compromising privacy, recruiting devices into a botnet, or misusing devices to perform cryptocurrency mining. In this paper, we present a practical Host-based Anomaly DEtection System for IoT (HADES-IoT) that represents the last line of defense. HADES-IoT has proactive detection capabilities, provides tamper-proof resistance, and it can be deployed on a wide range of Linux-based IoT devices. The main advantage of HADES-IoT is its low performance overhead, which makes it suitable for the IoT domain, where state-of-the-art approaches cannot be applied due to their high-performance demands. We deployed HADES-IoT on seven IoT devices to evaluate its effectiveness and performance overhead. Our experiments show that HADES-IoT achieved 100% effectiveness in the detection of current IoT malware such as VPNFilter and IoTReaper; while on average, requiring only 5.5% of available memory and causing only a low CPU load

    eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys

    Full text link
    For years security machine learning research has promised to obviate the need for signature based detection by automatically learning to detect indicators of attack. Unfortunately, this vision hasn't come to fruition: in fact, developing and maintaining today's security machine learning systems can require engineering resources that are comparable to that of signature-based detection systems, due in part to the need to develop and continuously tune the "features" these machine learning systems look at as attacks evolve. Deep learning, a subfield of machine learning, promises to change this by operating on raw input signals and automating the process of feature design and extraction. In this paper we propose the eXpose neural network, which uses a deep learning approach we have developed to take generic, raw short character strings as input (a common case for security inputs, which include artifacts like potentially malicious URLs, file paths, named pipes, named mutexes, and registry keys), and learns to simultaneously extract features and classify using character-level embeddings and convolutional neural network. In addition to completely automating the feature design and extraction process, eXpose outperforms manual feature extraction based baselines on all of the intrusion detection problems we tested it on, yielding a 5%-10% detection rate gain at 0.1% false positive rate compared to these baselines

    Malware Lineage in the Wild

    Full text link
    Malware lineage studies the evolutionary relationships among malware and has important applications for malware analysis. A persistent limitation of prior malware lineage approaches is to consider every input sample a separate malware version. This is problematic since a majority of malware are packed and the packing process produces many polymorphic variants (i.e., executables with different file hash) of the same malware version. Thus, many samples correspond to the same malware version and it is challenging to identify distinct malware versions from polymorphic variants. This problem does not manifest in prior malware lineage approaches because they work on synthetic malware, malware that are not packed, or packed malware for which unpackers are available. In this work, we propose a novel malware lineage approach that works on malware samples collected in the wild. Given a set of malware executables from the same family, for which no source code is available and which may be packed, our approach produces a lineage graph where nodes are versions of the family and edges describe the relationships between versions. To enable our malware lineage approach, we propose the first technique to identify the versions of a malware family and a scalable code indexing technique for determining shared functions between any pair of input samples. We have evaluated the accuracy of our approach on 13 open-source programs and have applied it to produce lineage graphs for 10 popular malware families. Our malware lineage graphs achieve on average a 26 times reduction from number of input samples to number of versions

    Towards Generic Deobfuscation of Windows API Calls

    Full text link
    A common way to get insight into a malicious program's functionality is to look at which API functions it calls. To complicate the reverse engineering of their programs, malware authors deploy API obfuscation techniques, hiding them from analysts' eyes and anti-malware scanners. This problem can be partially addressed by using dynamic analysis; that is, by executing a malware sample in a controlled environment and logging the API calls. However, malware that is aware of virtual machines and sandboxes might terminate without showing any signs of malicious behavior. In this paper, we introduce a static analysis technique allowing generic deobfuscation of Windows API calls. The technique utilizes symbolic execution and hidden Markov models to predict API names from the arguments passed to the API functions. Our best prediction model can correctly identify API names with 87.60% accuracy.Comment: To be published in the 2018 Network and Distributed Systems Security (NDSS) Symposium via its 2018 Workshop on Binary Analysis Research (BAR

    Hijacking .NET to Defend PowerShell

    Full text link
    With the rise of attacks using PowerShell in the recent months, there has not been a comprehensive solution for monitoring or prevention. Microsoft recently released the AMSI solution for PowerShell v5, however this can also be bypassed. This paper focuses on repurposing various stealthy runtime .NET hijacking techniques implemented for PowerShell attacks for defensive monitoring of PowerShell. It begins with a brief introduction to .NET and PowerShell, followed by a deeper explanation of various attacker techniques, which is explained from the perspective of the defender, including assembly modification, class and method injection, compiler profiling, and C based function hooking. Of the four attacker techniques that are repurposed for defensive real-time monitoring of PowerShell execution, intermediate language binary modification, JIT hooking, and machine code manipulation provide the best results for stealthy run-time interfaces for PowerShell scripting analysis.Comment: 13 pages, 41 figures, CanSecWest 2017, BSidesSF 2017, Powershell, .NE
    • …
    corecore