3,793 research outputs found

    A Novel Malware Detection System Based On Machine Learning and Binary Visualization

    Full text link
    The continued evolution and diversity of malware constitutes a major threat in modern systems. It is well proven that security defenses currently available are ineffective to mitigate the skills and imagination of cyber-criminals necessitating the development of novel solutions. Deep learning algorithms and artificial intelligence (AI) are rapidly evolving with remarkable results in many application areas. Following the advances of AI and recognizing the need for efficient malware detection methods, this paper presents a new approach for malware detection based on binary visualization and self-organizing incremental neural networks. The proposed method's performance in detecting malicious payloads in various file types was investigated and the experimental results showed that a detection accuracy of 91.7% and 94.1% was achieved for ransomware in .pdf and .doc files respectively. With respect to other formats of malicious code and other file types, including binaries, the proposed method behaved well with an incremental detection rate that allows efficiently detecting unknown malware at real-time

    Deep Image: A precious image based deep learning method for online malware detection in IoT Environment

    Full text link
    The volume of malware and the number of attacks in IoT devices are rising everyday, which encourages security professionals to continually enhance their malware analysis tools. Researchers in the field of cyber security have extensively explored the usage of sophisticated analytics and the efficiency of malware detection. With the introduction of new malware kinds and attack routes, security experts confront considerable challenges in developing efficient malware detection and analysis solutions. In this paper, a different view of malware analysis is considered and the risk level of each sample feature is computed, and based on that the risk level of that sample is calculated. In this way, a criterion is introduced that is used together with accuracy and FPR criteria for malware analysis in IoT environment. In this paper, three malware detection methods based on visualization techniques called the clustering approach, the probabilistic approach, and the deep learning approach are proposed. Then, in addition to the usual machine learning criteria namely accuracy and FPR, a proposed criterion based on the risk of samples has also been used for comparison, with the results showing that the deep learning approach performed better in detecting malwareComment: 10 pages, 17 figures, SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, MARCH 202

    Malicious Software Detection and Classification utilizing Temporal-Graphs of System-call Group Relations

    Full text link
    In this work we propose a graph-based model that, utilizing relations between groups of System-calls, distinguishes malicious from benign software samples and classifies the detected malicious samples to one of a set of known malware families. More precisely, given a System-call Dependency Graph (ScDG) that depicts the malware's behavior, we first transform it to a more abstract representation, utilizing the indexing of System-calls to a set of groups of similar functionality, constructing thus an abstract and mutation-tolerant graph that we call Group Relation Graph (GrG); then, we construct another graph representation, which we call Coverage Graph (CvG), that depicts the dominating relations between the nodes of a GrG graph. Based on the research so far in the field, we pointed out that behavior-based graph representations had not leveraged the aspect of the temporal evolution of the graph. Hence, the novelty of our work is that, preserving the initial representations of GrG and CvG graphs, we focus on augmenting the potentials of theses graphs by adding further features that enhance its abilities on detecting and further classifying to a known malware family an unknown malware sample. To that end, we construct periodical instances of the graph that represent its temporal evolution concerning its structural modifications, creating another graph representation that we call Temporal Graphs. In this paper, we present the theoretical background behind our approach, discuss the current technological status on malware detection and classification and demonstrate the overall architecture of our proposed detection and classification model alongside with its underlying main principles and its structural key-components.Comment: 23 pages, 15 figures, 1 tabl

    Malytics: A Malware Detection Scheme

    Full text link
    An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by tf -simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97.21% and 99.45% on Android dex file and Windows PE files respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated

    Microsoft Malware Classification Challenge

    Full text link
    The Microsoft Malware Classification Challenge was announced in 2015 along with a publication of a huge dataset of nearly 0.5 terabytes, consisting of disassembly and bytecode of more than 20K malware samples. Apart from serving in the Kaggle competition, the dataset has become a standard benchmark for research on modeling malware behaviour. To date, the dataset has been cited in more than 50 research papers. Here we provide a high-level comparison of the publications citing the dataset. The comparison simplifies finding potential research directions in this field and future performance evaluation of the dataset

    CrowdSource: Automated Inference of High Level Malware Functionality from Low-Level Symbols Using a Crowd Trained Machine Learning Model

    Full text link
    In this paper we introduce CrowdSource, a statistical natural language processing system designed to make rapid inferences about malware functionality based on printable character strings extracted from malware binaries. CrowdSource "learns" a mapping between low-level language and high-level software functionality by leveraging millions of web technical documents from StackExchange, a popular network of technical question and answer sites, using this mapping to infer malware capabilities. This paper describes our approach and provides an evaluation of its accuracy and performance, demonstrating that it can detect at least 14 high-level malware capabilities in unpacked malware binaries with an average per-capability f-score of 0.86 and at a rate of tens of thousands of binaries per day on commodity hardware

    Grouping the executables to detect malware with high accuracy

    Full text link
    The metamorphic malware variants with the same malicious behavior (family), can obfuscate themselves to look different from each other. This variation in structure leads to a huge signature database for traditional signature matching techniques to detect them. In order to effective and efficient detection of malware in large amounts of executables, we need to partition these files into groups which can identify their respective families. In addition, the grouping criteria should be chosen such a way that, it can also be applied to unknown files encounter on computers for classification. This paper discusses the study of malware and benign executables in groups to detect unknown malware with high accuracy. We studied sizes of malware generated by three popular second generation malware (metamorphic malware) creator kits viz. G2, PS-MPC and NGVCK, and observed that the size variation in any two generated malware from same kit is not much. Hence, we grouped the executables on the basis of malware sizes by using Optimal k-Means Clustering algorithm and used these obtained groups to select promising features for training (Random forest, J48, LMT, FT and NBT) classifiers to detect variants of malware or unknown malware. We find that detection of malware on the basis of their respected file sizes gives accuracy up to 99.11% from the classifiers.Comment: 8 Pages, 13 Figures. arXiv admin note: text overlap with arXiv:1606.0689

    Malware Detection at the Microarchitecture Level using Machine Learning Techniques

    Full text link
    Detection of malware cyber-attacks at the processor microarchitecture level has recently emerged as a promising solution to enhance the security of computer systems. Security mechanisms, such as hardware-based malware detection, use machine learning algorithms to classify and detect malware with the aid of Hardware Performance Counters (HPCs) information. The ML classifiers are fed microarchitectural data extracted from Hardware Performance Counters (HPCs), which contain behavioral data about a software program. These HPCs are captured at run-time to model the program's behavior. Since the amount of HPCs are limited per processor, many techniques employ feature reduction to reduce the amount of HPCs down to the most essential attributes. Previous studies have already used binary classification to implement their malware detection after doing extensive feature reduction. This results in a simple identification of software being either malware or benign. This research comprehensively analyzes different hardware-based malware detectors by comparing different machine learning algorithms' accuracy with binary and multi-class classification models. Our experimental results indicate that when compared to complex machine learning models (e. g. Neural Network and Logistic), light-weight J48 and JRip algorithms perform better in detecting the malicious patterns even with the introduction of multiple types of malware. Although their detection accuracy slightly lowers, their robustness (Area Under the Curve) is still high enough that they deliver a reasonable false positive rate.Comment: 28 pages, 7 figures, 4 table

    DeepOrigin: End-to-End Deep Learning for Detection of New Malware Families

    Full text link
    In this paper, we present a novel method of differentiating known from previously unseen malware families. We utilize transfer learning by learning compact file representations that are used for a new classification task between previously seen malware families and novel ones. The learned file representations are composed of static and dynamic features of malware and are invariant to small modifications that do not change their malicious functionality. Using an extensive dataset that consists of thousands of variants of malicious files, we were able to achieve 97.7% accuracy when classifying between seen and unseen malware families. Our method provides an important focalizing tool for cybersecurity researchers and greatly improves the overall ability to adapt to the fast-moving pace of the current threat landscape

    Deep learning at the shallow end: Malware classification for non-domain experts

    Full text link
    Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification
    • …
    corecore