302 research outputs found

    Neural malware detection

    Get PDF
    At the heart of today’s malware problem lies theoretically infinite diversity created by metamorphism. The majority of conventional machine learning techniques tackle the problem with the assumptions that a sufficiently large number of training samples exist and that the training set is independent and identically distributed. However, the lack of semantic features combined with the models under these wrong assumptions result largely in overfitting with many false positives against real world samples, resulting in systems being left vulnerable to various adversarial attacks. A key observation is that modern malware authors write a script that automatically generates an arbitrarily large number of diverse samples that share similar characteristics in program logic, which is a very cost-effective way to evade detection with minimum effort. Given that many malware campaigns follow this paradigm of economic malware manufacturing model, the samples within a campaign are likely to share coherent semantic characteristics. This opens up a possibility of one-to-many detection. Therefore, it is crucial to capture this non-linear metamorphic pattern unique to the campaign in order to detect these seemingly diverse but identically rooted variants. To address these issues, this dissertation proposes novel deep learning models, including generative static malware outbreak detection model, generative dynamic malware detection model using spatio-temporal isomorphic dynamic features, and instruction cognitive malware detection. A comparative study on metamorphic threats is also conducted as part of the thesis. Generative adversarial autoencoder (AAE) over convolutional network with global average pooling is introduced as a fundamental deep learning framework for malware detection, which captures highly complex non-linear metamorphism through translation invariancy and local variation insensitivity. Generative Adversarial Network (GAN) used as a part of the framework enables oneshot training where semantically isomorphic malware campaigns are identified by a single malware instance sampled from the very initial outbreak. This is a major innovation because, to the best of our knowledge, no approach has been found to this challenging training objective against the malware distribution that consists of a large number of very sparse groups artificially driven by arms race between attackers and defenders. In addition, we propose a novel method that extracts instruction cognitive representation from uninterpreted raw binary executables, which can be used for oneto- many malware detection via one-shot training against frequency spectrum of the Transformer’s encoded latent representation. The method works regardless of the presence of diverse malware variations while remaining resilient to adversarial attacks that mostly use random perturbation against raw binaries. Comprehensive performance analyses including mathematical formulations and experimental evaluations are provided, with the proposed deep learning framework for malware detection exhibiting a superior performance over conventional machine learning methods. The methods proposed in this thesis are applicable to a variety of threat environments here artificially formed sparse distributions arise at the cyber battle fronts.Doctor of Philosoph

    MDFRCNN: Malware Detection using Faster Region Proposals Convolution Neural Network

    Get PDF
    Technological advancement of smart devices has opened up a new trend: Internet of Everything (IoE), where all devices are connected to the web. Large scale networking benefits the community by increasing connectivity and giving control of physical devices. On the other hand, there exists an increased ‘Threat’ of an ‘Attack’. Attackers are targeting these devices, as it may provide an easier ‘backdoor entry to the users’ network’.MALicious softWARE (MalWare) is a major threat to user security. Fast and accurate detection of malware attacks are the sine qua non of IoE, where large scale networking is involved. The paper proposes use of a visualization technique where the disassembled malware code is converted into gray images, as well as use of Image Similarity based Statistical Parameters (ISSP) such as Normalized Cross correlation (NCC), Average difference (AD), Maximum difference (MaxD), Singular Structural Similarity Index Module (SSIM), Laplacian Mean Square Error (LMSE), MSE and PSNR. A vector consisting of gray image with statistical parameters is trained using a Faster Region proposals Convolution Neural Network (F-RCNN) classifier. The experiment results are promising as the proposed method includes ISSP with F-RCNN training. Overall training time of learning the semantics of higher-level malicious behaviors is less. Identification of malware (testing phase) is also performed in less time. The fusion of image and statistical parameter enhances system performance with greater accuracy. The benchmark database from Microsoft Malware Classification challenge has been used to analyze system performance, which is available on the Kaggle website. An overall average classification accuracy of 98.12% is achieved by the proposed method

    An analysis of android malware classification services

    Get PDF
    The increasing number of Android malware forced antivirus (AV) companies to rely on automated classification techniques to determine the family and class of suspicious samples. The research community relies heavily on such labels to carry out prevalence studies of the threat ecosystem and to build datasets that are used to validate and benchmark novel detection and classification methods. In this work, we carry out an extensive study of the Android malware ecosystem by surveying white papers and reports from 6 key players in the industry, as well as 81 papers from 8 top security conferences, to understand how malware datasets are used by both. We, then, explore the limitations associated with the use of available malware classification services, namely VirusTotal (VT) engines, for determining the family of an Android sample. Using a dataset of 2.47 M Android malware samples, we find that the detection coverage of VT's AVs is generally very low, that the percentage of samples flagged by any 2 AV engines does not go beyond 52%, and that common families between any pair of AV engines is at best 29%. We rely on clustering to determine the extent to which different AV engine pairs agree upon which samples belong to the same family (regardless of the actual family name) and find that there are discrepancies that can introduce noise in automatic label unification schemes. We also observe the usage of generic labels and inconsistencies within the labels of top AV engines, suggesting that their efforts are directed towards accurate detection rather than classification. Our results contribute to a better understanding of the limitations of using Android malware family labels as supplied by common AV engines.This work has been supported by the “Ramon y Cajal” Fellowship RYC-2020-029401

    SYSTEMATIC DISCOVERY OF ANDROID CUSTOMIZATION HAZARDS

    Get PDF
    The open nature of Android ecosystem has naturally laid the foundation for a highly fragmented operating system. In fact, the official AOSP versions have been aggressively customized into thousands of system images by everyone in the customization chain, such as device manufacturers, vendors, carriers, etc. If not well thought-out, the customization process could result in serious security problems. This dissertation performs a systematic investigation of Android customization’ inconsistencies with regards to security aspects at various Android layers. It brings to light new vulnerabilities, never investigated before, caused by the under-regulated and complex Android customization. It first describes a novel vulnerability Hare and proves that it is security critical and extensive affecting devices from major vendors. A new tool is proposed to detect the Hare problem and to protect affected devices. This dissertation further discovers security configuration changes through a systematic differential analysis among custom devices from different vendors and demonstrates that they could lead to severe vulnerabilities if introduced unintentionally

    MACHINE LEARNING MODELS INTERPRETABILITY FOR MALWARE DETECTION USING MODEL AGNOSTIC LANGUAGE FOR EXPLORATION AND EXPLANATION

    Get PDF
    The adoption of the internet as a global platform has birthed a significant rise in cyber-attacks of various forms ranging from Trojans, worms, spyware, ransomware, botnet malware, rootkit, etc. In order to tackle the issue of all these forms of malware, there is a need to understand and detect them. There are various methods of detecting malware which include signature, behavioral, and machine learning. Machine learning methods have proven to be the most efficient of all for malware detection. In this thesis, a system that utilizes both the signature and dynamic behavior-based detection techniques, with the added layer of the machine learning algorithm with model explainability capability is proposed. This hybrid system provides not only predictions but also their interpretation and explanation for a malware detection task. The layer of a machine learning algorithm can be Logistic Regression, Random Forest, Naive Bayes, Decision Tree, or Support Vector Machine. Empirical performance evaluation results on publicly available datasets and manually acquired samples (both benign and malicious) are used to compare the five machine learning algorithms. DALEX (moDel Agnostic Language for Exploration and explanation) is integrated into the proposed hybrid approach to support the interpretation and understanding of the prediction to improve the trust of cyber security stakeholders in complex machine learning predictive models

    Mustererkennungsbasierte Verteidgung gegen gezielte Angriffe

    Get PDF
    The speed at which everything and everyone is being connected considerably outstrips the rate at which effective security mechanisms are introduced to protect them. This has created an opportunity for resourceful threat actors which have specialized in conducting low-volume persistent attacks through sophisticated techniques that are tailored to specific valuable targets. Consequently, traditional approaches are rendered ineffective against targeted attacks, creating an acute need for innovative defense mechanisms. This thesis aims at supporting the security practitioner in bridging this gap by introducing a holistic strategy against targeted attacks that addresses key challenges encountered during the phases of detection, analysis and response. The structure of this thesis is therefore aligned to these three phases, with each one of its central chapters taking on a particular problem and proposing a solution built on a strong foundation on pattern recognition and machine learning. In particular, we propose a detection approach that, in the absence of additional authentication mechanisms, allows to identify spear-phishing emails without relying on their content. Next, we introduce an analysis approach for malware triage based on the structural characterization of malicious code. Finally, we introduce MANTIS, an open-source platform for authoring, sharing and collecting threat intelligence, whose data model is based on an innovative unified representation for threat intelligence standards based on attributed graphs. As a whole, these ideas open new avenues for research on defense mechanisms and represent an attempt to counteract the imbalance between resourceful actors and society at large.In unserer heutigen Welt sind alle und alles miteinander vernetzt. Dies bietet mĂ€chtigen Angreifern die Möglichkeit, komplexe Verfahren zu entwickeln, die auf spezifische Ziele angepasst sind. Traditionelle AnsĂ€tze zur BekĂ€mpfung solcher Angriffe werden damit ineffektiv, was die Entwicklung innovativer Methoden unabdingbar macht. Die vorliegende Dissertation verfolgt das Ziel, den Sicherheitsanalysten durch eine umfassende Strategie gegen gezielte Angriffe zu unterstĂŒtzen. Diese Strategie beschĂ€ftigt sich mit den hauptsĂ€chlichen Herausforderungen in den drei Phasen der Erkennung und Analyse von sowie der Reaktion auf gezielte Angriffe. Der Aufbau dieser Arbeit orientiert sich daher an den genannten drei Phasen. In jedem Kapitel wird ein Problem aufgegriffen und eine entsprechende Lösung vorgeschlagen, die stark auf maschinellem Lernen und Mustererkennung basiert. Insbesondere schlagen wir einen Ansatz vor, der eine Identifizierung von Spear-Phishing-Emails ermöglicht, ohne ihren Inhalt zu betrachten. Anschliessend stellen wir einen Analyseansatz fĂŒr Malware Triage vor, der auf der strukturierten Darstellung von Code basiert. Zum Schluss stellen wir MANTIS vor, eine Open-Source-Plattform fĂŒr Authoring, Verteilung und Sammlung von Threat Intelligence, deren Datenmodell auf einer innovativen konsolidierten Graphen-Darstellung fĂŒr Threat Intelligence Stardards basiert. Wir evaluieren unsere AnsĂ€tze in verschiedenen Experimenten, die ihren potentiellen Nutzen in echten Szenarien beweisen. Insgesamt bereiten diese Ideen neue Wege fĂŒr die Forschung zu Abwehrmechanismen und erstreben, das Ungleichgewicht zwischen mĂ€chtigen Angreifern und der Gesellschaft zu minimieren
    • 

    corecore