15 research outputs found

    Multi-level analysis of Malware using Machine Learning

    Get PDF
    Multi-level analysis of Malware using Machine Learnin

    Fuzzy pattern tree for edge malware detection and categorization in IoT

    Get PDF
    The surging pace of Internet of Things (IoT) development and its applications has resulted in significantly large amounts of data (commonly known as big data) being communicated and processed across IoT networks. While cloud computing has led to several possibilities in regard to this computational challenge, there are several security risks and concerns associated with it. Edge computing is a state-of-the-art subject in IoT that attempts to decentralize, distribute and transfer computation to IoT nodes. Furthermore, IoT nodes that perform applications are the primary target vectors which allow cybercriminals to threaten an IoT network. Hence, providing applied and robust methods to detect malicious activities by nodes is a big step to protect all of the network. In this study, we transmute the programs' OpCodes into a vector space and employ fuzzy and fast fuzzy pattern tree methods for malware detection and categorization and obtained a high degree of accuracy during reasonable run-times especially for the fast fuzzy pattern tree. Both utilized feature extraction and fuzzy classification which were robust and led to more powerful edge computing malware detection and categorization method

    Machine learning approaches for malware classification based on hybrid artefacts

    Get PDF
    Malware could be developed and transformed into various forms to deceive users and evade antivirus and security endpoint detection. Furthermore, if one machine in the network is compromised, it could be used for lateral movement--when malware spreads stealthily without sending an alarm to monitoring systems. Malware attacks pose security threats to modern enterprises and can cause massive financial, reputation, and data loss to major enterprises. Therefore, it is important to detect these attacks effectively to reduce the loss to the minimum level. The current research uses different approaches, including static and dynamic analysis, to detect and analyze malware categories using distinct feature sets, such as imported modules, opcodes, and API calls, which can improve performance in binary and multi-class classification problems. This thesis proposes a method for identifying and analyzing malware samples via static and dynamic approaches, including memory analysis and consecutive application operation sequences performed on the Windows 10 virtual environment. Standard classifiers and frequently used sequence models are utilized to expose the malware characteristics and benefit predictive capabilities. The features used in these algorithms are extracted from the static and dynamic analysis of malware samples, such as the rich header feature, debug information, temporary files, prefetch files, and event logs. The measurement of the classifiers and the degree of correctness are calculated using the accuracy, f1-score, Mean Absolute Error (MAE), confusion matrix, and Area under the ROC Curve (AUC). Combining two feature sets can provide the best classification performance on static file properties and dynamic analysis results, regardless of whether applying feature selection or not, achieving the accuracy and f1_score at 97% for integrating two datasets. For consecutive sequences, concatenating the Gated Recurrent Unit (GRU) and Transformers model can yield the highest accuracy at 97% for Noriben operations, while GRU can achieve the maximum accuracy for Opcode sequences at 89%

    Resilient and Scalable Android Malware Fingerprinting and Detection

    Get PDF
    Malicious software (Malware) proliferation reaches hundreds of thousands daily. The manual analysis of such a large volume of malware is daunting and time-consuming. The diversity of targeted systems in terms of architecture and platforms compounds the challenges of Android malware detection and malware in general. This highlights the need to design and implement new scalable and robust methods, techniques, and tools to detect Android malware. In this thesis, we develop a malware fingerprinting framework to cover accurate Android malware detection and family attribution. In this context, we emphasize the following: (i) the scalability over a large malware corpus; (ii) the resiliency to common obfuscation techniques; (iii) the portability over different platforms and architectures. In the context of bulk and offline detection on the laboratory/vendor level: First, we propose an approximate fingerprinting technique for Android packaging that captures the underlying static structure of the Android apps. We also propose a malware clustering framework on top of this fingerprinting technique to perform unsupervised malware detection and grouping by building and partitioning a similarity network of malicious apps. Second, we propose an approximate fingerprinting technique for Android malware's behavior reports generated using dynamic analyses leveraging natural language processing techniques. Based on this fingerprinting technique, we propose a portable malware detection and family threat attribution framework employing supervised machine learning techniques. Third, we design an automatic framework to produce intelligence about the underlying malicious cyber-infrastructures of Android malware. We leverage graph analysis techniques to generate relevant, actionable, and granular intelligence that can be used to identify the threat effects induced by malicious Internet activity associated to Android malicious apps. In the context of the single app and online detection on the mobile device level, we further propose the following: Fourth, we design a portable and effective Android malware detection system that is suitable for deployment on mobile and resource constrained devices, using machine learning classification on raw method call sequences. Fifth, we elaborate a framework for Android malware detection that is resilient to common code obfuscation techniques and adaptive to operating systems and malware change overtime, using natural language processing and deep learning techniques. We also evaluate the portability of the proposed techniques and methods beyond Android platform malware, as follows: Sixth, we leverage the previously elaborated techniques to build a framework for cross-platform ransomware fingerprinting relying on raw hybrid features in conjunction with advanced deep learning techniques

    Crafting Adversarial Examples using Particle Swarm Optimization

    Get PDF
    Machine learning models have been found to be vulnerable to adversarial attacks that apply small perturbations to input samples to get them misclassified. Attacks that search for and apply the perturbations are performed in both white-box and black-box settings, depending on the information available to the attacker about the target. For black-box attacks, the attacker can only query the target with specially crafted inputs and observing the outputs returned by the model. These outputs are used to guide the perturbations and create adversarial examples that are then misclassified. Current black-box attacks on API-based malware classifiers rely solely on feature insertion when applying perturbations. This restriction is set in place to ensure that no changes are introduced to the malware\u27s originally intended functionality. Additionally, the API calls being inserted in the malware are null or no-op APIs that have no functional affect to avoid any unintentional impact on malware behavior. Due to the nature of these API calls, they can be easily detected through non-ML techniques by analyzing their arguments and return values. In this dissertation, we explore other attacks on API-based malware detection models that are not restricted to feature addition. Specifically, we explore feature replacement as a possible avenue for creating adversarial malware examples. To retain the malware\u27s original functionality, we replace API calls with other functionally equivalent API calls. We find the API alternatives by using a hierarchical unsupervised learning approach on the API\u27s documentation. Our attack, which we call AdversarialPSO, uses Particle Swarm Optimization to guide the perturbations according to available function alternatives. Results show that creating adversarial malware examples by feature replacement is possible even under the more restrictive search space of limited function alternatives. Unlike the malware domain, which lacks benchmark datasets and publicly available classification models, image classification has multiple benchmarks to test new attacks. Therefore, to evaluate the efficacy and wide-applicability of AdversarialPSO, we re-implement the attack in the image classification domain, where we create adversarial examples from images by adding small often unrecognizable perturbations to the inputs. As a result of these perturbations, highly-accurate models misclassify the inputs resulting in a drastic drop in their accuracy. We evaluate this attack against both defended and undefended models and show that AdversarialPSO performs comparably to state-of-the-art adversarial attacks

    Machine learning techniques for android malware detection and classification

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 15-03-2019la realización de esta tesis no habría sido posible sin la financiación aportada por el proyecto CIBERDINE: Cybersecurity, Data and Risks (S2013/ICE3095) concedido por la Comunidad de Madrid

    A Deep-dive into Cryptojacking Malware: From an Empirical Analysis to a Detection Method for Computationally Weak Devices

    Get PDF
    Cryptojacking is an act of using a victim\u27s computation power without his/her consent. Unauthorized mining costs extra electricity consumption and decreases the victim host\u27s computational efficiency dramatically. In this thesis, we perform an extensive research on cryptojacking malware from every aspects. First, we present a systematic overview of cryptojacking malware based on the information obtained from the combination of academic research papers, two large cryptojacking datasets of samples, and numerous major attack instances. Second, we created a dataset of 6269 websites containing cryptomining scripts in their source codes to characterize the in-browser cryptomining ecosystem by differentiating permissioned and permissionless cryptomining samples. Third, we introduce an accurate and efficient IoT cryptojacking detection mechanism based on network traffic features that achieves an accuracy of 99%. Finally, we believe this thesis will greatly expand the scope of research and facilitate other novel solutions in the cryptojacking domain

    Malware detection at runtime for resource-constrained mobile devices: data-driven approach

    Get PDF
    The number of smart and connected mobile devices is increasing, bringing enormous possibilities to users in various domains and transforming everything that we get in touch with into smart. Thus, we have smart watches, smart phones, smart homes, and finally even smart cities. Increased smartness of mobile devices means that they contain more valuable information about their users, more decision making capabilities, and more control over sometimes even life-critical systems. Although, on one side, all of these are necessary in order to enable mobile devices maintain their main purpose to help and support people, on the other, it opens new vulnerabilities. Namely, with increased number and volume of smart devices, also the interest of attackers to abuse them is rising, making their security one of the main challenges. The main mean that the attackers use in order to abuse mobile devices is malicious software, shortly called malware. One way to protect against malware is by using static analysis, that investigates the nature of software by analyzing its static features. However, this technique detects well only known malware and it is prone to obfuscation, which means that it is relatively easy to create a new malicious sample that would be able to pass the radar. Thus, alone, is not powerful enough to protect the users against increasing malicious attacks. The other way to cope with malware is through dynamic analysis, where the nature of the software is decided based on its behavior during its execution on a device. This is a promising solution, because while the code of the software can be easily changed to appear as new, the same cannot be done with ease with its behavior when being executed. However, in order to achieve high accuracy dynamic analysis usually requires computational resources that are beyond suitable for battery-operated mobile devices. This is further complicated if, in addition to detecting the presence of malware, we also want to understand which type of malware it is, in order to trigger suitable countermeasures. Finally, the decisions on potential infections have to happen early enough, to guarantee minimal exposure to the attacks. Fulfilling these requirements in a mobile, battery-operated environments is a challenging task, for which, to the best of our knowledge, a suitable solution is not yet proposed. In this thesis, we pave the way towards such a solution by proposing a dynamic malware detection system that is able to early detect malware that appears at runtime and that provides useful information to discriminate between diverse types of malware while taking into account limited resources of mobile devices. On a mobile device we monitor a set of the representative features for presence of malware and based on them we trigger an alarm if software infection is observed. When this happens, we analyze a set of previously stored information relevant for malware classification, in order to understand what type of malware is being executed. In order to make the detection efficient and suitable for resource-constrained environments of mobile devices, we minimize the set of observed system parameters to only the most informative ones for both detection and classification. Additionally, since sampling period of monitoring infrastructure is directly connected to the power consumption, we take it into account as an important parameter of the development of the detection system. In order to make detection effective, we use dynamic features related to memory, CPU, system calls and network as they reflect well the behavior of a system. Our experiments show that the monitoring with a sampling period of eight seconds gives a good trade-off between detection accuracy, detection time and consumed power. Using it and by monitoring a set of only seven dynamic features (six related to the behavior of memory and one of CPU), we are able to provide a detection solution that satisfies the initial requirements and to detect malware at runtime with F- measure of 0.85, within 85.52 seconds of its execution, and with consumed average power of 20mW. Apart from observed features containing enough information to discriminate between malicious and benign applications, our results show that they can also be used to discriminate between diverse behavior of malware, reflected in different malware families. Using small number of features we are able to identify the presence of the malicious records from the considered family with precision of up to 99.8%. In addition to the standalone use of the proposed detection solution, we have also used it in a hybrid scenario where the applications were first analyzed by a static method, and it was able to detect correctly all the malware previously undetected by static analysis with false positive rate of 3.81% and average detection time of 44.72s. The method, we have designed, tested and validated, has been applied on a smartphone running on Android Operating System. However, since in the design of this method efficient usage of available computational resources was one of our main criteria, we are confident that the method as such can be applied also on the other battery-operated mobile devices of Internet of Things, in order to provide an effective and efficient system able to counter the ever-increasing and ever-evolving number and a variety of malicious attacks

    Cybersecurity applications of Blockchain technologies

    Get PDF
    With the increase in connectivity, the popularization of cloud services, and the rise of the Internet of Things (IoT), decentralized approaches for trust management are gaining momentum. Since blockchain technologies provide a distributed ledger, they are receiving massive attention from the research community in different application fields. However, this technology does not provide cybersecurity by itself. Thus, this thesis first aims to provide a comprehensive review of techniques and elements that have been proposed to achieve cybersecurity in blockchain-based systems. The analysis is intended to target area researchers, cybersecurity specialists and blockchain developers. We present a series of lessons learned as well. One of them is the rise of Ethereum as one of the most used technologies. Furthermore, some intrinsic characteristics of the blockchain, like permanent availability and immutability made it interesting for other ends, namely as covert channels and malicious purposes. On the one hand, the use of blockchains by malwares has not been characterized yet. Therefore, this thesis also analyzes the current state of the art in this area. One of the lessons learned is that covert communications have received little attention. On the other hand, although previous works have analyzed the feasibility of covert channels in a particular blockchain technology called Bitcoin, no previous work has explored the use of Ethereum to establish a covert channel considering all transaction fields and smart contracts. To foster further defence-oriented research, two novel mechanisms are presented on this thesis. First, Zephyrus takes advantage of all Ethereum fields and smartcontract bytecode. Second, Smart-Zephyrus is built to complement Zephyrus by leveraging smart contracts written in Solidity. We also assess the mechanisms feasibility and cost. Our experiments show that Zephyrus, in the best case, can embed 40 Kbits in 0.57 s. for US1.64,andretrievethemin2.8s.SmartZephyrus,however,isabletohidea4Kbsecretin41s.Whilebeingexpensive(aroundUS 1.64, and retrieve them in 2.8 s. Smart-Zephyrus, however, is able to hide a 4 Kb secret in 41 s. While being expensive (around US 1.82 per bit), the provided stealthiness might be worth the price for attackers. Furthermore, these two mechanisms can be combined to increase capacity and reduce costs.Debido al aumento de la conectividad, la popularización de los servicios en la nube y el auge del Internet de las cosas (IoT), los enfoques descentralizados para la gestión de la confianza están cobrando impulso. Dado que las tecnologías de cadena de bloques (blockchain) proporcionan un archivo distribuido, están recibiendo una atención masiva por parte de la comunidad investigadora en diferentes campos de aplicación. Sin embargo, esta tecnología no proporciona ciberseguridad por sí misma. Por lo tanto, esta tesis tiene como primer objetivo proporcionar una revisión exhaustiva de las técnicas y elementos que se han propuesto para lograr la ciberseguridad en los sistemas basados en blockchain. Este análisis está dirigido a investigadores del área, especialistas en ciberseguridad y desarrolladores de blockchain. A su vez, se presentan una serie de lecciones aprendidas, siendo una de ellas el auge de Ethereum como una de las tecnologías más utilizadas. Asimismo, algunas características intrínsecas de la blockchain, como la disponibilidad permanente y la inmutabilidad, la hacen interesante para otros fines, concretamente como canal encubierto y con fines maliciosos. Por una parte, aún no se ha caracterizado el uso de la blockchain por parte de malwares. Por ello, esta tesis también analiza el actual estado del arte en este ámbito. Una de las lecciones aprendidas al analizar los datos es que las comunicaciones encubiertas han recibido poca atención. Por otro lado, aunque trabajos anteriores han analizado la viabilidad de los canales encubiertos en una tecnología blockchain concreta llamada Bitcoin, ningún trabajo anterior ha explorado el uso de Ethereum para establecer un canal encubierto considerando todos los campos de transacción y contratos inteligentes. Con el objetivo de fomentar una mayor investigación orientada a la defensa, en esta tesis se presentan dos mecanismos novedosos. En primer lugar, Zephyrus aprovecha todos los campos de Ethereum y el bytecode de los contratos inteligentes. En segundo lugar, Smart-Zephyrus complementa Zephyrus aprovechando los contratos inteligentes escritos en Solidity. Se evalúa, también, la viabilidad y el coste de ambos mecanismos. Los resultados muestran que Zephyrus, en el mejor de los casos, puede ocultar 40 Kbits en 0,57 s. por 1,64 US$, y recuperarlos en 2,8 s. Smart-Zephyrus, por su parte, es capaz de ocultar un secreto de 4 Kb en 41 s. Si bien es cierto que es caro (alrededor de 1,82 dólares por bit), el sigilo proporcionado podría valer la pena para los atacantes. Además, estos dos mecanismos pueden combinarse para aumentar la capacidad y reducir los costesPrograma de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José Manuel Estévez Tapiador.- Secretario: Jorge Blasco Alís.- Vocal: Luis Hernández Encina
    corecore