23 research outputs found

    Differentiating malware from cleanware using behavioural analysis

    Full text link
    This paper proposes a scalable approach for distinguishing malicious files from clean files by investigating the behavioural features using logs of various API calls. We also propose, as an alternative to the traditional method of manually identifying malware files, an automated classification system using runtime features of malware files. For both projects, we use an automated tool running in a virtual environment to extract API call features from executables and apply pattern recognition algorithms and statistical methods to differentiate between files. Our experimental results, based on a dataset of 1368 malware and 456 cleanware files, provide an accuracy of over 97% in distinguishing malware from cleanware. Our techniques provide a similar accuracy for classifying malware into families. In both cases, our results outperform comparable previously published techniques

    Feature selection and machine learning classification for malware detection

    Get PDF
    Malware is a computer security problem that can morph to evade traditional detection methods based on known signature matching. Since new malware variants contain patterns that are similar to those in observed malware, machine learning techniques can be used to identify new malware. This work presents a comparative study of several feature selection methods with four different machine learning classifiers in the context of static malware detection based on n-grams analysis. The result shows that the use of Principal Component Analysis (PCA) feature selection and Support Vector Machines (SVM) classification gives the best classification accuracy using a minimum number of feature

    An Evaluation Of N-gram System Call Sequence In Mobile Malware Detection

    Get PDF
    The rapid growth of Android-based mobile devices technology in recent years has increased the proliferation of mobile devices throughout the community at large. The ability of Android mobile devices has become similar to its desktop environment; users can do more than just a phone call and short text messaging. These days, Android mobile devices are used for various applications such as web browsing, ubiquitous services, social networking, MMS and many more. However, the rapid growth of Android mobile devices technology has also triggered the malware author to start exploiting the vulnerabilities of the devices. Based on this reason, this paper explores mobile malware detection through an n-gram system call sequence which uses a sequence of system call invoked by the mobile application as the feature in classifying a benign and malicious mobile application. Several n-gram values are evaluated with Linear-SVM classifier to determine the best n system call sequence that produces the highest detection accuracy and highest True Positive Rate (TPR) with low False Positive Rate (FPR)

    Machine-Learning Classifiers for Malware Detection Using Data Features

    Get PDF
    The spread of ransomware has risen exponentially over the past decade, causing huge financial damage to multiple organizations. Various anti-ransomware firms have suggested methods for preventing malware threats. The growing pace, scale and sophistication of malware provide the anti-malware industry with more challenges. Recent literature indicates that academics and anti-virus organizations have begun to use artificial learning as well as fundamental modeling techniques for the research and identification of malware. Orthodox signature-based anti-virus programs struggle to identify unfamiliar malware and track new forms of malware. In this study, a malware evaluation framework focused on machine learning was adopted that consists of several modules: dataset compiling in two separate classes (malicious and benign software), file disassembly, data processing, decision making, and updated malware identification. The data processing module uses grey images, functions for importing and Opcode n-gram to remove malware functionality. The decision making module detects malware and recognizes suspected malware. Different classifiers were considered in the research methodology for the detection and classification of malware. Its effectiveness was validated on the basis of the accuracy of the complete process

    Intra-procedural Path-insensitive Grams (i-grams) and Disassembly Based Features for Packer Tool Classification and Detection

    Get PDF
    The DoD relies on over seven million computing devices worldwide to accomplish a wide range of goals and missions. Malicious software, or malware, jeopardizes these goals and missions. However, determining whether an arbitrary software executable is malicious can be difficult. Obfuscation tools, called packers, are often used to hide the malicious intent of malware from anti-virus programs. Therefore detecting whether or not an arbitrary executable file is packed is a critical step in software security. This research uses machine learning methods to build a system, the Polymorphic and Non-Polymorphic Packer Detection (PNPD) system, that detects whether an executable is packed using both sequences of instructions, called i-grams, and disassembly information as features for machine learning. Both i-grams and disassembly features successfully detect packed executables with top configurations achieving average accuracies above 99.5\%, average true positive rates above 0.977, and average false positive rates below 1.6e-3 when detecting polymorphic packers

    A Comparative Study on Feature Selection Method for N-gram Mobile Malware Detection

    Get PDF
    Abstract In recent years, mobile device technology has become an important necessity in our community at large. The ability of the mobile technology today has become more similar to its desktop environment. Despite the advancement of the mobile devices technology provide, it has also exposes the mobile devices to the similar threat it predecessor possess. One of the anomaly based detection methods used in detecting mobile malware is the n-gram system call sequence. However, with the limited storage, memory and CPU processing power, mobile devices that provide this approach can exhaust the mobile device resources. This is due to the huge amount of system call to be collected and processed for the detection approach. To overcome the issues, this paper investigates the use of several different feature selection methods in optimizing the n-gram system call sequence feature in classifying benign and malicious mobile application. Several filter and wrapper feature selection methods are selected and their performance analyzed. The feature selection methods are evaluated based on the number of feature selected and the contribution it made to improve the True Positive Rate (TPR), False Positive Rate (FPR) and Accuracy of the Linear-SVM classifier in classifying benign and malicious mobile malware application

    Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework

    Get PDF
    Researchers from academia and the corporate-sector rely on scholarly digital libraries to access articles. Attackers take advantage of innocent users who consider the articles' files safe and thus open PDF-files with little concern. In addition, researchers consider scholarly libraries a reliable, trusted, and untainted corpus of papers. For these reasons, scholarly digital libraries are an attractive-target and inadvertently support the proliferation of cyber-attacks launched via malicious PDF-files. In this study, we present related vulnerabilities and malware distribution approaches that exploit the vulnerabilities of scholarly digital libraries. We evaluated over two-million scholarly papers in the CiteSeerX library and found the library to be contaminated with a surprisingly large number (0.3-2%) of malicious PDF documents (over 55% were crawled from the IPs of US-universities). We developed a two layered detection framework aimed at enhancing the detection of malicious PDF documents, Sec-Lib, which offers a security solution for large digital libraries. Sec-Lib includes a deterministic layer for detecting known malware, and a machine learning based layer for detecting unknown malware. Our evaluation showed that scholarly digital libraries can detect 96.9% of malware with Sec-Lib, while minimizing the number of PDF-files requiring labeling, and thus reducing the manual inspection efforts of security-experts by 98%

    Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework

    Get PDF
    Researchers from academia and the corporate-sector rely on scholarly digital libraries to access articles. Attackers take advantage of innocent users who consider the articles\u27 files safe and thus open PDF-files with little concern. In addition, researchers consider scholarly libraries a reliable, trusted, and untainted corpus of papers. For these reasons, scholarly digital libraries are an attractive-target and inadvertently support the proliferation of cyber-attacks launched via malicious PDF-files. In this study, we present related vulnerabilities and malware distribution approaches that exploit the vulnerabilities of scholarly digital libraries. We evaluated over two-million scholarly papers in the CiteSeerX library and found the library to be contaminated with a surprisingly large number (0.3-2%) of malicious PDF documents (over 55% were crawled from the IPs of US-universities). We developed a two layered detection framework aimed at enhancing the detection of malicious PDF documents, Sec-Lib, which offers a security solution for large digital libraries. Sec-Lib includes a deterministic layer for detecting known malware, and a machine learning based layer for detecting unknown malware. Our evaluation showed that scholarly digital libraries can detect 96.9% of malware with Sec-Lib, while minimizing the number of PDF-files requiring labeling, and thus reducing the manual inspection efforts of security-experts by 98%
    corecore