159 research outputs found

    Adapting Text Categorization for Manifest based Android Malware Detection

    Get PDF
    There are mainly three different approaches to detect malwares: i) static, ii) dynamic, and iii) hybrid. Static approach uses static source of the program without executing it. Dynamic approach, on the other hand, executes the program in a controlled environment and obtains information from operating system during runtime. Hybrid approach, as its name implies, is the combination of these two approaches. Although static approach may seem to have some disadvantages, it is highly preferred because of its lower cost. In this paper, we assume that obfuscated malware is processed by dynamic analysis and perform static malware detection based on text categorization methods. To reach our goal, we apply text mining techniques like feature extraction by using bag-of-words, n-grams, etc. from \texttt{manifest content} of programs to investigate the effectiveness of the malware detection. Our experimental results revealed that our approach is capable of detecting malicious applications with an accuracy between 94.0% and 99.3%

    Robust Mobile Malware Detection

    Get PDF
    The increasing popularity and use of smartphones and hand-held devices have made them the most popular target for malware attackers. Researchers have proposed machine learning-based models to automatically detect malware attacks on these devices. Since these models learn application behaviors solely from the extracted features, choosing an appropriate and meaningful feature set is one of the most crucial steps for designing an effective mobile malware detection system. There are four categories of features for mobile applications. Previous works have taken arbitrary combinations of these categories to design models, resulting in sub-optimal performance. This thesis systematically investigates the individual impact of these feature categories on mobile malware detection systems. Feature categories that complement each other are investigated and categories that add redundancy to the feature space (thereby degrading the performance) are analyzed. In the process, the combination of feature categories that provides the best detection results is identified. Ensuring reliability and robustness of the above-mentioned malware detection systems is of utmost importance as newer techniques to break down such systems continue to surface. Adversarial attack is one such evasive attack that can bypass a detection system by carefully morphing a malicious sample even though the sample was originally correctly identified by the same system. Self-crafted adversarial samples can be used to retrain a model to defend against such attacks. However, randomly using too many such samples, as is currently done in the literature, can further degrade detection performance. This work proposed two intelligent approaches to retrain a classifier through the intelligent selection of adversarial samples. The first approach adopts a distance-based scheme where the samples are chosen based on their distance from malware and benign cluster centers while the second selects the samples based on a probability measure derived from a kernel-based learning method. The second method achieved a 6% improvement in terms of accuracy. To ensure practical deployment of malware detection systems, it is necessary to keep the real-world data characteristics in mind. For example, the benign applications deployed in the market greatly outnumber malware applications. However, most studies have assumed a balanced data distribution. Also, techniques to handle imbalanced data in other domains cannot be applied directly to mobile malware detection since they generate synthetic samples with broken functionality, making them invalid. In this regard, this thesis introduces a novel synthetic over-sampling technique that ensures valid sample generation. This technique is subsequently combined with a dynamic cost function in the learning scheme that automatically adjusts minority class weight during model training which counters the bias towards the majority class and stabilizes the model. This hybrid method provided a 9% improvement in terms of F1-score. Aiming to design a robust malware detection system, this thesis extensively studies machine learning-based mobile malware detection in terms of best feature category combination, resilience against evasive attacks, and practical deployment of detection models. Given the increasing technological advancements in mobile and hand-held devices, this study will be very useful for designing robust cybersecurity systems to ensure safe usage of these devices.Doctor of Philosoph

    Few-Shot Malware Detection Using A Novel Adversarial Reprogramming Model

    Get PDF
    The increasing sophistication of malware has made detecting and defending against new strains a major challenge for cybersecurity. One promising approach to this problem is using machine learning techniques that extract representative features and train classification models to detect malware in an early stage. However, training such machine learning-based malware detection models represents a significant challenge that requires a large number of high-quality labeled data samples while it is very costly to obtain them in real-world scenarios. In other words, training machine learning models for malware detection requires the capability to learn from only a few labeled examples. To address this challenge, in this thesis, we propose a novel adversarial reprogramming model for few-shot malware detection. Our model is based on the idea to re-purpose high-performance ImageNet classification model to perform malware detection using the features of malicious and benign files. We first embed the features of software files and a small perturbation to a host image chosen randomly from ImageNet, and then create an image dataset to train and test the model; after that, the model transforms the output into malware and benign classes. We evaluate the effectiveness of our model on a dataset of real-world malware and show that it significantly outperforms baseline few-shot learning methods. Additionally, we evaluate the impact of different pre-trained models, different data sizes, and different parameter values. Overall, our results suggest that the proposed adversarial reprogramming model is a promising direction for improving few-shot malware detection

    Intelligent Malware Detection Using File-to-file Relations and Enhancing its Security against Adversarial Attacks

    Get PDF
    With computing devices and the Internet being indispensable in people\u27s everyday life, malware has posed serious threats to their security, making its detection of utmost concern. To protect legitimate users from the evolving malware attacks, machine learning-based systems have been successfully deployed and offer unparalleled flexibility in automatic malware detection. In most of these systems, resting on the analysis of different content-based features either statically or dynamically extracted from the file samples, various kinds of classifiers are constructed to detect malware. However, besides content-based features, file-to-file relations, such as file co-existence, can provide valuable information in malware detection and make evasion harder. To better understand the properties of file-to-file relations, we construct the file co-existence graph. Resting on the constructed graph, we investigate the semantic relatedness among files, and leverage graph inference, active learning and graph representation learning for malware detection. Comprehensive experimental results on the real sample collections from Comodo Cloud Security Center demonstrate the effectiveness of our proposed learning paradigms. As machine learning-based detection systems become more widely deployed, the incentive for defeating them increases. Therefore, we go further insight into the arms race between adversarial malware attack and defense, and aim to enhance the security of machine learning-based malware detection systems. In particular, we first explore the adversarial attacks under different scenarios (i.e., different levels of knowledge the attackers might have about the targeted learning system), and define a general attack strategy to thoroughly assess the adversarial behaviors. Then, considering different skills and capabilities of the attackers, we propose the corresponding secure-learning paradigms to counter the adversarial attacks and enhance the security of the learning systems while not compromising the detection accuracy. We conduct a series of comprehensive experimental studies based on the real sample collections from Comodo Cloud Security Center and the promising results demonstrate the effectiveness of our proposed secure-learning models, which can be readily applied to other detection tasks

    Machine Learning and other Computational-Intelligence Techniques for Security Applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Malware Analysis with Machine Learning

    Get PDF
    Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2022Malware attacks have been one of the most serious cyber risks in recent years. Almost every week, the number of vulnerability reports is increasing in the security communities. One of the key causes for the exponential growth is the fact that malware authors started introducing mutations to avoid detection. This means that malicious files from the same malware family, with the same malicious behaviour, are constantly modified or obfuscated using a variety of technics to make them appear to be different. Characteristics retrieved from raw binary files or disassembled code are used in existing machine learning-based malware categorization algorithms. The variety of such attributes has made it difficult to develop generic malware categorization methods that operate well in a variety of operating scenarios. To be effective in evaluating and categorizing such enormous volumes of data, it is necessary to divide them into groups and identify their respective families based on their behaviour. Malicious software is converted to a greyscale image representation, due to the possibility to capture subtle changes while keeping the global structure helps to detect variations. Motivated by the Machine Learning results achieved in the ImageNet challenge, this dissertation proposes an agnostic deep learning solution, for efficiently classifying malware into families based on a collection of discriminant patterns retrieved from its visualization as images. In this thesis, we present Malwizard, an adaptable Python solution suited for companies or end users, that allows them to automatically obtain a fast malware analysis. The solution was implemented as an Outlook add-in and an API service for the SOAR platforms, as emails are the first vector for this type of attack, with companies being the most attractive targets. The Microsoft Classification Challenge dataset was used in the evaluation of the noble approach. Therefore, its image representation was ciphered and generated the correspondent ciphered image to evaluate if the same patterns could be identified using traditional machine learning techniques. Thus, allowing the privacy concerns to be addressed, maintaining the data analysed by neural networks secure to unauthorized parties. Experimental comparison demonstrates the noble approach performed close to the best analysed model on a plain text dataset, completing the task in one-third of the time. Regarding the encrypted dataset, classical techniques need to be adapted in order to be efficient

    Mustererkennungsbasierte Verteidgung gegen gezielte Angriffe

    Get PDF
    The speed at which everything and everyone is being connected considerably outstrips the rate at which effective security mechanisms are introduced to protect them. This has created an opportunity for resourceful threat actors which have specialized in conducting low-volume persistent attacks through sophisticated techniques that are tailored to specific valuable targets. Consequently, traditional approaches are rendered ineffective against targeted attacks, creating an acute need for innovative defense mechanisms. This thesis aims at supporting the security practitioner in bridging this gap by introducing a holistic strategy against targeted attacks that addresses key challenges encountered during the phases of detection, analysis and response. The structure of this thesis is therefore aligned to these three phases, with each one of its central chapters taking on a particular problem and proposing a solution built on a strong foundation on pattern recognition and machine learning. In particular, we propose a detection approach that, in the absence of additional authentication mechanisms, allows to identify spear-phishing emails without relying on their content. Next, we introduce an analysis approach for malware triage based on the structural characterization of malicious code. Finally, we introduce MANTIS, an open-source platform for authoring, sharing and collecting threat intelligence, whose data model is based on an innovative unified representation for threat intelligence standards based on attributed graphs. As a whole, these ideas open new avenues for research on defense mechanisms and represent an attempt to counteract the imbalance between resourceful actors and society at large.In unserer heutigen Welt sind alle und alles miteinander vernetzt. Dies bietet mächtigen Angreifern die Möglichkeit, komplexe Verfahren zu entwickeln, die auf spezifische Ziele angepasst sind. Traditionelle Ansätze zur Bekämpfung solcher Angriffe werden damit ineffektiv, was die Entwicklung innovativer Methoden unabdingbar macht. Die vorliegende Dissertation verfolgt das Ziel, den Sicherheitsanalysten durch eine umfassende Strategie gegen gezielte Angriffe zu unterstützen. Diese Strategie beschäftigt sich mit den hauptsächlichen Herausforderungen in den drei Phasen der Erkennung und Analyse von sowie der Reaktion auf gezielte Angriffe. Der Aufbau dieser Arbeit orientiert sich daher an den genannten drei Phasen. In jedem Kapitel wird ein Problem aufgegriffen und eine entsprechende Lösung vorgeschlagen, die stark auf maschinellem Lernen und Mustererkennung basiert. Insbesondere schlagen wir einen Ansatz vor, der eine Identifizierung von Spear-Phishing-Emails ermöglicht, ohne ihren Inhalt zu betrachten. Anschliessend stellen wir einen Analyseansatz für Malware Triage vor, der auf der strukturierten Darstellung von Code basiert. Zum Schluss stellen wir MANTIS vor, eine Open-Source-Plattform für Authoring, Verteilung und Sammlung von Threat Intelligence, deren Datenmodell auf einer innovativen konsolidierten Graphen-Darstellung für Threat Intelligence Stardards basiert. Wir evaluieren unsere Ansätze in verschiedenen Experimenten, die ihren potentiellen Nutzen in echten Szenarien beweisen. Insgesamt bereiten diese Ideen neue Wege für die Forschung zu Abwehrmechanismen und erstreben, das Ungleichgewicht zwischen mächtigen Angreifern und der Gesellschaft zu minimieren
    corecore