402 research outputs found

    Multi-level analysis of Malware using Machine Learning

    Get PDF
    Multi-level analysis of Malware using Machine Learnin

    Malware: the never-ending arm race

    Get PDF
    "Antivirus is death"' and probably every detection system that focuses on a single strategy for indicators of compromise. This famous quote that Brian Dye --Symantec's senior vice president-- stated in 2014 is the best representation of the current situation with malware detection and mitigation. Concealment strategies evolved significantly during the last years, not just like the classical ones based on polimorphic and metamorphic methodologies, which killed the signature-based detection that antiviruses use, but also the capabilities to fileless malware, i.e. malware only resident in volatile memory that makes every disk analysis senseless. This review provides a historical background of different concealment strategies introduced to protect malicious --and not necessarily malicious-- software from different detection or analysis techniques. It will cover binary, static and dynamic analysis, and also new strategies based on machine learning from both perspectives, the attackers and the defenders

    Malware Analysis and Detection with Explainable Machine Learning

    Get PDF
    Malware detection is one of the areas where machine learning is successfully employed due to its high discriminating power and the capability of identifying novel variants of malware samples. Typically, the problem formulation is strictly correlated to the use of a wide variety of features covering several characteristics of the entities to classify. Apparently, this practice allows achieving considerable detection performance. However, it hardly permits us to gain insights into the knowledge extracted by the learning algorithm, causing two main issues. First, detectors might learn spurious patterns; thus, undermining their effectiveness in real environments. Second, they might be particularly vulnerable to adversarial attacks; thus, weakening their security. These concerns give rise to the necessity to develop systems that are tailored to the specific peculiarities of the attacks to detect. Within malware detection, Android ransomware represents a challenging yet illustrative domain for assessing the relevance of this issue. Ransomware represents a serious threat that acts by locking the compromised device or encrypting its data, then forcing the device owner to pay a ransom in order to restore the device functionality. Attackers typically develop such dangerous apps so that normally-legitimate components and functionalities perform malicious behaviour; thus, making them harder to be distinguished from genuine applications. In this sense, adopting a well-defined variety of features and relying on some kind of explanations about the logic behind such detectors could improve their design process since it could reveal truly characterising features; hence, guiding the human expert towards the understanding of the most relevant attack patterns. Given this context, the goal of the thesis is to explore strategies that may improve the design process of malware detectors. In particular, the thesis proposes to evaluate and integrate approaches based on rising research on Explainable Machine Learning. To this end, the work follows two pathways. The first and main one focuses on identifying the main traits that result to be characterising and effective for Android ransomware detection. Then, explainability techniques are used to propose methods to assess the validity of the considered features. The second pathway broadens the view by exploring the relationship between explainable machine learning and adversarial attacks. In this regard, the contribution consists of pointing out metrics extracted from explainability techniques that can reveal models' robustness to adversarial attacks, together with an assessment of the practical feasibility for attackers to alter the features that affect models' output the most. Ultimately, this work highlights the necessity to adopt a design process that is aware of the weaknesses and attacks against machine learning-based detectors, and proposes explainability techniques as one of the tools to counteract them

    A Deep-Learning Based Robust Framework Against Adversarial P.E. and Cryptojacking Malware

    Get PDF
    This graduate thesis introduces novel, deep-learning based frameworks that are resilient to adversarial P.E. and cryptojacking malware. We propose a method that uses a convolutional neural network (CNN) to classify image representations of malware, that provides robustness against numerous adversarial attacks. Our evaluation concludes that the image-based malware classifier is significantly more robust to adversarial attacks than a state-of-the-art ML-based malware classifier, and remarkably drops the evasion rate of adversarial samples to 0% in certain attacks. Further, we develop MINOS, a novel, lightweight cryptojacking detection system that accurately detects the presence of unwarranted mining activity in real-time. MINOS can detect mining activity with a low TNR and FPR, in an average of 25.9 milliseconds while using a maximum of 4% of CPU and 6.5% of RAM. Therefore, it can be concluded that the frameworks presented in this thesis attain high accuracy, are computationally inexpensive, and are resistant to adversarial perturbations

    Android malware detection: An eigenspace analysis approach

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The battle to mitigate Android malware has become more critical with the emergence of new strains incorporating increasingly sophisticated evasion techniques, in turn necessitating more advanced detection capabilities. Hence, in this paper we propose and evaluate a machine learning based approach based on eigenspace analysis for Android malware detection using features derived from static analysis characterization of Android applications. Empirical evaluation with a dataset of real malware and benign samples show that detection rate of over 96% with a very low false positive rate is achievable using the proposed method

    An ensemble-based anomaly-behavioural crypto-ransomware pre-encryption detection model

    Get PDF
    Crypto-ransomware is a malware that leverages cryptography to encrypt files for extortion purposes. Even after neutralizing such attacks, the targeted files remain encrypted. This irreversible effect on the target is what distinguishes crypto-ransomware attacks from traditional malware. Thus, it is imperative to detect such attacks during pre-encryption phase. However, existing crypto-ransomware early detection solutions are not effective due to inaccurate definition of the pre-encryption phase boundaries, insufficient data at that phase and the misuse-based approach that the solutions employ, which is not suitable to detect new (zero-day) attacks. Consequently, those solutions suffer from low detection accuracy and high false alarms. Therefore, this research addressed these issues and developed an Ensemble-Based Anomaly-Behavioural Pre-encryption Detection Model (EABDM) to overcome data insufficiency and improve detection accuracy of known and novel crypto-ransomware attacks. In this research, three phases were used in the development of EABDM. In the first phase, a Dynamic Pre-encryption Boundary Definition and Features Extraction (DPBD-FE) scheme was developed by incorporating Rocchio feedback and vector space model to build a pre-encryption boundary vector. Then, an improved term frequency-inverse document frequency technique was utilized to extract the features from runtime data generated during the pre-encryption phase of crypto-ransomware attacks’ lifecycle. In the second phase, a Maximum of Minimum-Based Enhanced Mutual Information Feature Selection (MM-EMIFS) technique was used to select the informative features set, and prevent overfitting caused by high dimensional data. The MM-EMIFS utilized the developed Redundancy Coefficient Gradual Upweighting (RCGU) technique to overcome data insufficiency during pre-encryption phase and improve feature’s significance estimation. In the final phase, an improved technique called incremental bagging (iBagging) built incremental data subsets for anomaly and behavioural-based detection ensembles. The enhanced semi-random subspace selection (ESRS) technique was then utilized to build noise-free and diverse subspaces for each of these incremental data subsets. Based on the subspaces, the base classifiers were trained for each ensemble. Both ensembles employed the majority voting to combine the decisions of the base classifiers. After that, the decision of the anomaly ensemble was combined into behavioural ensemble, which gave the final decision. The experimental evaluation showed that, DPBD-FE scheme reduced the ratio of crypto-ransomware samples whose pre-encryption boundaries were missed from 18% to 8% as compared to existing works. Additionally, the features selected by MM-EMIFS technique improved the detection accuracy from 89% to 96% as compared to existing techniques. Likewise, on average, the EABDM model increased detection accuracy from 85% to 97.88% and reduced the false positive alarms from 12% to 1% in comparison to existing early detection models. These results demonstrated the ability of the EABDM to improve the detection accuracy of crypto-ransomware attacks early and before the encryption takes place to protect files from being held to ransom

    XMD: An Expansive Hardware-telemetry based Mobile Malware Detector to enhance Endpoint Detection

    Full text link
    Hardware-based Malware Detectors (HMDs) have shown promise in detecting malicious workloads. However, the current HMDs focus solely on the CPU core of a System-on-Chip (SoC) and, therefore, do not exploit the full potential of the hardware telemetry. In this paper, we propose XMD, an HMD that uses an expansive set of telemetry channels extracted from the different subsystems of SoC. XMD exploits the thread-level profiling power of the CPU-core telemetry, and the global profiling power of non-core telemetry channels, to achieve significantly better detection performance than currently used Hardware Performance Counter (HPC) based detectors. We leverage the concept of manifold hypothesis to analytically prove that adding non-core telemetry channels improves the separability of the benign and malware classes, resulting in performance gains. We train and evaluate XMD using hardware telemetries collected from 723 benign applications and 1033 malware samples on a commodity Android Operating System (OS)-based mobile device. XMD improves over currently used HPC-based detectors by 32.91% for the in-distribution test data. XMD achieves the best detection performance of 86.54% with a false positive rate of 2.9%, compared to the detection rate of 80%, offered by the best performing signature-based Anti-Virus(AV) on VirusTotal, on the same set of malware samples.Comment: Revised version based on peer review feedback. Manuscript to appear in IEEE Transactions on Information Forensics and Securit
    • …
    corecore