12 research outputs found

    Nonnegative matrix factorization and metamorphic malware detection

    Get PDF
    Metamorphic malware change their internal code structure by adopting code obfuscation technique while maintaining their malicious functionality during each infection. This causes change of their signature pattern across each infection and makes signature based detection particularly difficult. In this paper, through static analysis, we use similarity score from matrix factorization technique called Nonnegative Matrix Factorization for detecting challenging metamorphic malware. We apply this technique using structural compression ratio and entropy features and compare our results with previous eigenvector-based techniques. Experimental results from three malware datasets show this is a promising technique as the accuracy detection is more than 95%

    Structural features with nonnegative matrix factorization for metamorphic malware detection

    Get PDF
    Metamorphic malware is well known for evading signature-based detection by exploiting various code obfuscation techniques. Current metamorphic malware detection approaches require some prior knowledge during feature engineering stage to extract patterns and behaviors from malware. In this paper, we attempt to complement and extend previous techniques by proposing a metamorphic malware detection approach based on structure analysis by using information theoretic measures and statistical metrics with machine learning model. In particular, compression ratio, entropy, Jaccard coefficient and Chi-square tests are used as feature representations to reveal the byte information existing in malware binary file. Furthermore, by using Nonnegative Matrix Factorization, feature dimension can be reduced. The experimental results show the Jaccard coefficient on hexadecimal byte as feature representation is effective for Windows metamorphic malware detection with an accuracy rate and F-score as high as 0.9972 and 0.9958, respectively. Whereas for Linux morphed malware detection, the Chi-square statistic test shows as effective feature representation with an accuracy rate and F-score as high as 0.9878 and 0.9901, respectively. Overall, the proposed feature representations and the technique of dimension reduction can be useful for detecting metamorphic malware

    Metamorphic malware detection using structural features and nonnegative matrix factorization with hidden markov model

    Get PDF
    Metamorphic malware modifies its code structure using a morphing engine to evade traditional signature-based detection. Previous research has shown the use of opcode instructions as feature representation with Hidden Markov Model in the context of metamorphic malware detection. However, it would be more feasible to extract a file feature at fine-grained level. In this paper, we propose a novel detection approach by generating structural features through computing a stream of byte chunks using compression ratio, entropy, Jaccard similarity coefficient and Chi-square statistic test. Nonnegative Matrix Factorization is also considered to reduce the feature dimensions. We then use the coefficient vectors from the reduced space to train Hidden Markov Model. Experimental results show there is different performance between malware detection and classification among the proposed structural features

    Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection

    Full text link
    Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this paper, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier, that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier, we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.Comment: Accepted at ACM TOP

    Malware Pattern of Life Analysis

    Get PDF
    Many malware classifications include viruses, worms, trojans, ransomware, bots, adware, spyware, rootkits, file-less downloaders, malvertising, and many more. Each type may share unique behavioral characteristics with its methods of operations (MO), a pattern of behavior so distinctive that it could be recognized as having the same creator. The research shows the extraction of malware methods of operation using the step-by-step process of Artificial-Based Intelligence (ABI) with built-in Density-based spatial clustering of applications with noise (DBSCAN) machine learning to quantify the actions for their similarities, differences, baseline behaviors, and anomalies. The collected data of the research is from the ransomware sample repositories of Malware Bazaar and Virus Share, totaling 1300 live malicious codes ingested into the CAPEv2 malware sandbox, allowing the capture of traces of static, dynamic, and network behavior features. The ransomware features have shown significant activity of varying identified functions used in encryption, file application programming interface (API), and network function calls. During the machine learning categorization phase, there are eight identified clusters that have similar and different features regarding function-call sequencing events and file access manipulation for dropping file notes and writing encryption. Having compared all the clusters using a “supervenn” pictorial diagram, the characteristics of the static and dynamic behavior of the ransomware give the initial baselines for comparison with other variants that may have been added to the collected data for intelligence gathering. The findings provide a novel practical approach for intelligence gathering to address ransomware or any other malware variants’ activity patterns to discern similarities, anomalies, and differences between malware actions under study

    Malware detection issues, future trends and challenges: a survey

    Get PDF
    This paper focuses on the challenges and issues of detecting malware in to-day's world where cyberattacks continue to grow in number and complexity. The paper reviews current trends and technologies in malware detection and the limitations of existing detection methods such as signature-based detection and heuristic analysis. The emergence of new types of malware, such as file-less malware, is also discussed, along with the need for real-time detection and response. The research methodology used in this paper is presented, which includes a literature review of recent papers on the topic, keyword searches, and analysis and representation methods used in each study. In this paper, the authors aim to address the key issues and challenges in detecting malware today, the current trends and technologies in malware detection, and the limitations of existing methods. They also explore emerging threats and trends in malware attacks and highlight future directions for research and development in the field. To achieve this, the authors use a research methodology that involves a literature review of recent papers related to the topic. They focus on detecting and analyzing methods, as well as representation and extraction methods used in each study. Finally, they classify the literature re-view, and through reading and criticism, highlight future trends and problems in the field of malware detection

    Malware detection issues, challenges, and future directions: A survey

    Get PDF
    The evolution of recent malicious software with the rising use of digital services has increased the probability of corrupting data, stealing information, or other cybercrimes by malware attacks. Therefore, malicious software must be detected before it impacts a large number of computers. Recently, many malware detection solutions have been proposed by researchers. However, many challenges limit these solutions to effectively detecting several types of malware, especially zero-day attacks due to obfuscation and evasion techniques, as well as the diversity of malicious behavior caused by the rapid rate of new malware and malware variants being produced every day. Several review papers have explored the issues and challenges of malware detection from various viewpoints. However, there is a lack of a deep review article that associates each analysis and detection approach with the data type. Such an association is imperative for the research community as it helps to determine the suitable mitigation approach. In addition, the current survey articles stopped at a generic detection approach taxonomy. Moreover, some review papers presented the feature extraction methods as static, dynamic, and hybrid based on the utilized analysis approach and neglected the feature representation methods taxonomy, which is considered essential in developing the malware detection model. This survey bridges the gap by providing a comprehensive state-of-the-art review of malware detection model research. This survey introduces a feature representation taxonomy in addition to the deeper taxonomy of malware analysis and detection approaches and links each approach with the most commonly used data types. The feature extraction method is introduced according to the techniques used instead of the analysis approach. The survey ends with a discussion of the challenges and future research directions

    Exploiting loop transformations for the protection of software

    Get PDF
    Il software conserva la maggior parte del know-how che occorre per svilupparlo. Poich\ue9 oggigiorno il software pu\uf2 essere facilmente duplicato e ridistribuito ovunque, il rischio che la propriet\ue0 intellettuale venga violata su scala globale \ue8 elevato. Una delle pi\uf9 interessanti soluzioni a questo problema \ue8 dotare il software di un watermark. Ai watermark si richiede non solo di certificare in modo univoco il proprietario del software, ma anche di essere resistenti e pervasivi. In questa tesi riformuliamo i concetti di robustezza e pervasivit\ue0 a partire dalla semantica delle tracce. Evidenziamo i cicli quali costrutti di programmazione pervasivi e introduciamo le trasformazioni di ciclo come mattone di costruzione per schemi di watermarking pervasivo. Passiamo in rassegna alcune fra tali trasformazioni, studiando i loro principi di base. Infine, sfruttiamo tali principi per costruire una tecnica di watermarking pervasivo. La robustezza rimane una difficile, quanto affascinante, questione ancora da risolvere.Software retains most of the know-how required fot its development. Because nowadays software can be easily cloned and spread worldwide, the risk of intellectual property infringement on a global scale is high. One of the most viable solutions to this problem is to endow software with a watermark. Good watermarks are required not only to state unambiguously the owner of software, but also to be resilient and pervasive. In this thesis we base resiliency and pervasiveness on trace semantics. We point out loops as pervasive programming constructs and we introduce loop transformations as the basic block of pervasive watermarking schemes. We survey several loop transformations, outlining their underlying principles. Then we exploit these principles to build some pervasive watermarking techniques. Resiliency still remains a big and challenging open issue

    Malware Detection Using DNS Traffic Analysis

    Get PDF
    Tato diplomová práce se zabýva návrhem a implementací nástroje pro detekci malwaru pomocí analýzy DNS provozu. Text práce je rozdělen na teoretickou a praktickou část. V teoretické části bude nejdříve čtenář obeznámen s problematikou malwaru, botnetů a jejich detekce. Následně budou popsány jednotlivé možnosti a metody detekce malwaru a bezpečnostních incidentů. Praktická část obsahuje popis architektury výsledného nástroje na detekci malwaru a klíčových aspektů jeho implementace. Velký důraz byl kladen především na výsledné testovaní a experimenty. Výsledkem práce je nástroj, skript v jazyce python, pro detekci malwaru ze zachycených DNS dat, který využívá kombinaci více metod detekce.This master thesis deals with the design and implementation of a tool for malware detection using DNS traffic analysis. Text of the thesis is divided into theoretical and practical part. In theoretical part the reader will be acknowledged with the domain of malware and botnet detection. Consequently, various options and methods of malware detection will be described. Practical part of the thesis contains description of malware detection tool architecture as well as key aspects of its implementation. Moreover, the emphasis is being placed on testing and experiments. The result of the thesis is a tool, written in python, for malware detection using DNS traffic analysis, that uses a combination of several methods of detection.
    corecore