256 research outputs found

    Detection of Crypto-Ransomware Attack Using Deep Learning

    Get PDF
    The number one threat to the digital world is the exponential increase in ransomware attacks. Ransomware is malware that prevents victims from accessing their resources by locking or encrypting the data until a ransom is paid. With individuals and businesses growing dependencies on technology and the Internet, researchers in the cyber security field are looking for different measures to prevent malicious attackers from having a successful campaign. A new ransomware variant is being introduced daily, thus behavior-based analysis of detecting ransomware attacks is more effective than the traditional static analysis. This paper proposes a multi-variant classification to detect ransomware I/O operations from benign applications. The deep learning models implemented in the proposed approach are Bi-directional Long Short-Term Memory (Bi-LSTM) and Convolutional Neural Networks (CNN). The deep learning models are compared against a classic machine learning model such as Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF). The ransomware samples contain 70 binaries from 30 different ransomware extracted during the encryption of an extensive network shared directory. The benign samples came from network traffic traces recorded in a campus LAN where staff users access files from shared servers. A sample contains I/O operations (short Control Commands, bytes being read, and written) per second over a period of T seconds. The proposed deep learning models are tested with Zero-day ransomware samples as well. Both Bi-LSTM and CNN achieved above 98% in accurately classifying ransomware and benign samples

    Malicious code detection in android : the role of sequence characteristics and disassembling methods

    Get PDF
    The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers

    Automatic Detection of Insecure Codes in Stack Overflow

    Get PDF
    As the popularity of modern social coding paradigm such as Stack Overflow grows, its potential security risks increase as well (e.g., insecure codes could be easily embedded and distributed). To address this largely overlooked issue, we bring a new insight to exploit social coding properties in addition to code content for automatic detection of insecure code snippets in Stack Overflow. To determine if the given code snippets are insecure, we not only analyze the code content, but also utilize various kinds of relations among users, badges, questions, answers, code snippets and keywords in Stack Overflow. To model the rich semantic relationships, we first introduce a structured heterogeneous information network (HIN) for representation and then use meta-path based approach to incorporate higher-level semantics to build up relatedness over code snippets. Later, we propose two different novel network embedding models named Snippet2vec and CodeHin2Vec for representation learning in HIN to automate the insecure code snippet detection in Stack Overflow. More specifically, Snippet2vec learns the low dimensional representations for the nodes (i.e., code snippets) in the HIN where both the HIN structures and semantics are maximally preserved, while CodeHin2Vec utilizes HIN to depict relatedness over code snippets to generate code-to-code sequences, based on which sequence-to-sequence (seq2seq) concept in machine translation is further leveraged to learn representations of code snippets. Accordingly, we developed systems ICSD and iTrustSO which integrate our proposed methods respectively in insecure code snippet detection in Stack Overflow. Comprehensive experiments on the data collections from Stack Overflow are conducted to validate the effectiveness of our developed systems by comparisons with the state-of-the-art baseline methods

    Neural malware detection

    Get PDF
    At the heart of today’s malware problem lies theoretically infinite diversity created by metamorphism. The majority of conventional machine learning techniques tackle the problem with the assumptions that a sufficiently large number of training samples exist and that the training set is independent and identically distributed. However, the lack of semantic features combined with the models under these wrong assumptions result largely in overfitting with many false positives against real world samples, resulting in systems being left vulnerable to various adversarial attacks. A key observation is that modern malware authors write a script that automatically generates an arbitrarily large number of diverse samples that share similar characteristics in program logic, which is a very cost-effective way to evade detection with minimum effort. Given that many malware campaigns follow this paradigm of economic malware manufacturing model, the samples within a campaign are likely to share coherent semantic characteristics. This opens up a possibility of one-to-many detection. Therefore, it is crucial to capture this non-linear metamorphic pattern unique to the campaign in order to detect these seemingly diverse but identically rooted variants. To address these issues, this dissertation proposes novel deep learning models, including generative static malware outbreak detection model, generative dynamic malware detection model using spatio-temporal isomorphic dynamic features, and instruction cognitive malware detection. A comparative study on metamorphic threats is also conducted as part of the thesis. Generative adversarial autoencoder (AAE) over convolutional network with global average pooling is introduced as a fundamental deep learning framework for malware detection, which captures highly complex non-linear metamorphism through translation invariancy and local variation insensitivity. Generative Adversarial Network (GAN) used as a part of the framework enables oneshot training where semantically isomorphic malware campaigns are identified by a single malware instance sampled from the very initial outbreak. This is a major innovation because, to the best of our knowledge, no approach has been found to this challenging training objective against the malware distribution that consists of a large number of very sparse groups artificially driven by arms race between attackers and defenders. In addition, we propose a novel method that extracts instruction cognitive representation from uninterpreted raw binary executables, which can be used for oneto- many malware detection via one-shot training against frequency spectrum of the Transformer’s encoded latent representation. The method works regardless of the presence of diverse malware variations while remaining resilient to adversarial attacks that mostly use random perturbation against raw binaries. Comprehensive performance analyses including mathematical formulations and experimental evaluations are provided, with the proposed deep learning framework for malware detection exhibiting a superior performance over conventional machine learning methods. The methods proposed in this thesis are applicable to a variety of threat environments here artificially formed sparse distributions arise at the cyber battle fronts.Doctor of Philosoph

    Malware detection based on call graph similarities

    Get PDF
    S rostoucím množstvím škodlivých souborů se stalo využití strojového učení pro jejich detekci nezbytností. Autoři škodlivých souborů vytváří důmyslnější programy, aby překonali stále se zlepšující antivirovou ochranu. Windows OS zůstává nejčastějším cílem útoků. Viry se často šíří ve formátu Portable Executable (PE). PE soubory mohou být zkoumány pomocí metod statické analýzy, které se hodí pro zpracovávání velkého množství dat. Mnoho antivirových systémů disassembluje soubory a zkoumá jejich kód, který nabízí vhled do funkcionality souboru. Assembly kód je členěn do funkcí. Vztahy mezi funkcemi zachycuje graf volání funkcí (GVF). Tento graf byl zkoumán v literatuře a jeho struktura byla využita k hledání podobností mezi soubory. V poslední době začaly být úspěšně využívány grafové neuronové sítě (GNN) ke zpracování těchto grafů. V naší práci zkoumáme různé druhy a architektury GNN a vzájemně je porovnáváme. Po tom, co vybereme nejlepší GNN model, ho srovnáme s modelem, který nevyužívá grafovou strukturu GVF, abychom zjistili zda tato struktura zlepšuje klasifikační modely. Naši studii provádíme na velkém datasetu o více než 5 milionech PE souborů.Machine learning-powered malware detection systems became a necessity to fight the rising volume of malware. Malware authors create more sophisticated programs to overcome always improving antivirus engines. Windows OS remains the most targeted system, and the malicious payload commonly comes in the Portable executable (PE) file format. PE files can be analyzed with the static analysis methods, which are suitable for processing large amounts of data. Many engines disassemble binaries and study the code, which carries valuable insight into binary behavior. The assembly code is divided into functions that carry the functionality. The relations between functions form a Function Call Graph (FCG). FCG has been studied in the literature, and the graph structure was employed to find similarities between files. Recently, Graph Neural Networks (GNNs) have been adapted to work upon FCGs and are claimed to be performing well. In this work, we study and compare different GNN models and their architectures. After selecting the best GNN model, we compare it with a non-structural model to verify if an FCG structure improves classification models. We perform our empirical study on a large dataset of more than 5 million PE files
    corecore