11 research outputs found

    Defending Your Mobile Fortress: An In-Depth Look at on-Device Trojan Detection in Machine Learning: Systematic Literature Review

    Get PDF
    Mobile app trojans are becoming an increasingly serious threat to personal information security. They can cause severe damage by exposing sensitive and personally-identifying information to malicious actors. This paper’s contribution is a comprehensive review of the attack vectors for trojan attacks, and ways to eliminate the risks posed by attack vectors and generate settlement automatically. As such, such attacks must be prevented. In this study, we explore to find how to detect the trojan attack in detail, and the way that we know in machine learning. A review is conducted on the state-of-the-art methods using the preferred reporting items for reviews and meta-analyses (PRISMA) guidelines. We review literature from several publications and analyze the use of machine learning for on-device trojan detection. This review provides evidence for the effectiveness of machine learning in detecting such threats. The current trend shows that signature-based analysis using various metadata, such as permission, intent, API and system calls, and network analysis, are capable of detecting trojan attacks before and after the initial infectio

    An Investigation on Fragility of Machine Learning Classifiers in Android Malware Detection

    Get PDF
    Machine learning (ML) classifiers have been increasingly used in Android malware detection and countermeasures for the past decade. However, ML based solutions are vulnerable to adversarial evasion attacks. An attacker can craft a malicious sample carefully to fool an underlying pre-trained classifier. In this paper, we highlight the fragility of the ML classifiers against adversarial evasion attacks. We perform mimicry attacks based on Oracle and Generative Adversarial Network (GAN) against these classifiers using our proposed methodology. We use static analysis on Android applications to extract API-based features from a balanced excerpt of a well-known public dataset. The empirical results demonstrate that among ML classifiers, the detection capability of linear classifiers can be reduced as low as 0 by perturbing only up to 4 out of 315 extracted API features. As a countermeasure, we propose TrickDroid, a cumulative adversarial training scheme based on Oracle and GAN-based adversarial data to improve evasion detection. The experimental results of cumulative adversarial training achieves a remarkable detection accuracy of up to 99.46 against adversarial samples

    Obfuscation-resilient Android Malware Analysis Based on Contrastive Learning

    Full text link
    Due to its open-source nature, Android operating system has been the main target of attackers to exploit. Malware creators always perform different code obfuscations on their apps to hide malicious activities. Features extracted from these obfuscated samples through program analysis contain many useless and disguised features, which leads to many false negatives. To address the issue, in this paper, we demonstrate that obfuscation-resilient malware analysis can be achieved through contrastive learning. We take the Android malware classification as an example to demonstrate our analysis. The key insight behind our analysis is that contrastive learning can be used to reduce the difference introduced by obfuscation while amplifying the difference between malware and benign apps (or other types of malware). Based on the proposed analysis, we design a system that can achieve robust and interpretable classification of Android malware. To achieve robust classification, we perform contrastive learning on malware samples to learn an encoder that can automatically extract robust features from malware samples. To achieve interpretable classification, we transform the function call graph of a sample into an image by centrality analysis. Then the corresponding heatmaps are obtained by visualization techniques. These heatmaps can help users understand why the malware is classified as this family. We implement IFDroid and perform extensive evaluations on two widely used datasets. Experimental results show that IFDroid is superior to state-of-the-art Android malware familial classification systems. Moreover, IFDroid is capable of maintaining 98.2% true positive rate on classifying 8,112 obfuscated malware samples

    Obfuscated Malware Detection in IoT Android Applications Using Markov Images and CNN

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkThe threat of malware in the Internet of Things (IoT) is ever-present given that many IoT systems today rely on the Android operating system. There has been a consistent rise in Android malware recently, with new variants adopting sophisticated detection avoidance techniques, including various forms of obfuscation. Hence, there is a need to improve the effectiveness of Android malware detection as obfuscation becomes more prevalent in the wild. In this article, we present a novel approach for obfuscated malware detection in IoT Android applications based on the visualization of app executables with Markov images. The app images are trained using a convolutional neural network (CNN) to detect obfuscated malware and for the identification of the obfuscation type. We evaluate the performance of the proposed system by experimenting with four different classification models using 12000 Android applications. The CNN model created to distinguish between malware and benign apps obtained an accuracy of 99.41%. The model for identifying obfuscated malware from benign applications obtained 99.65% accuracy while the model created to identify obfuscated malware from non-obfuscated malware yielded an accuracy of 99.81%. The model for classifying obfuscated malware into 14 different obfuscation categories obtained an accuracy of 99.67%. These results show that CNN models trained from Markov images generated using application byte code can be highly effective for obfuscated malware detection and classification. Moreover, our proposed system provides a more sustainable and cost-effective method for obfuscated malware detection compared to the manual feature-engineering-based approaches that are more prevalent in the current literature

    Malware: the never-ending arm race

    Get PDF
    "Antivirus is death"' and probably every detection system that focuses on a single strategy for indicators of compromise. This famous quote that Brian Dye --Symantec's senior vice president-- stated in 2014 is the best representation of the current situation with malware detection and mitigation. Concealment strategies evolved significantly during the last years, not just like the classical ones based on polimorphic and metamorphic methodologies, which killed the signature-based detection that antiviruses use, but also the capabilities to fileless malware, i.e. malware only resident in volatile memory that makes every disk analysis senseless. This review provides a historical background of different concealment strategies introduced to protect malicious --and not necessarily malicious-- software from different detection or analysis techniques. It will cover binary, static and dynamic analysis, and also new strategies based on machine learning from both perspectives, the attackers and the defenders

    Proposed Framework to Improving Performance of Familial Classification in Android Malware

    Get PDF
    Because of the recent developments in hardware and software technologies for mobile phones, people depend on their smartphones more than ever before. Today, people conduct a variety of business, health, and financial transactions on their mobile devices. This trend has caused an influx of mobile applications that require users' sensitive information. As these applications increase so too have the number of malicious applications increased, which may compromise users' sensitive information. Between all smartphone, Android receives major attention from security practitioners and researchers due to the large number of malicious applications. For the past twelve years, Android malicious applications have been clustered into groups for better identification. Characterizing the malware families can improve the detection process and understand the malware patterns. However, in the research community, detecting new malware families is a challenge. In this research, a framework is proposed to improve the performance of familial classification in Android malware. The framework is named a Reverse Engineering Framework (RevEng). Within RevEng, applications' permissions were selected and then fed into machine learning algorithms. Through our research, we created a reduced set of permissions using Extremely Randomized Trees algorithm that achieved high accuracy and a shorter execution time. Furthermore, we conducted two approaches based on the extracted information. The first approach used a binary value representation of the permissions. The second approach used the features' importance. We represented each selected permission in latter approach by its weight value instead of its binary value in the former approach. We conducted a comparison between the results of our two approaches and other relevant works. Our approaches achieved better results in both accuracy and time performance with a reduced number of permissions

    Un modèle de détection de logiciels malveillants pour terminaux mobiles

    Get PDF
    Un modèle de détection de logiciels malveillants pour terminaux mobiles contribue au domaine de la sécurité informatique. La cybersécurité est une problématique actuelle majeure principalement motivée par le nombre croissant de cyberattaques. En effet, les pertes de données à cause des brèches informatiques ont coûté 45 milliards de dollars canadiens en 2018. En plus, du point de vue financier, des problèmes éthiques apparaissent également si des informations personnelles de clients et utilisateurs sont divulguées. En raison de la popularité des téléphones intelligents et des tablettes, les terminaux mobiles deviennent la cible de cyberattaques. Il est donc essentiel d’étudier de nouveaux moyens de prévenir, de détecter et de contrer les cyberattaques. Dans ces mécanismes de détection, l’apprentissage machine est utilisé pour créer des classificateurs qui permettent de déterminer si une application est dangereuse ou non. L’avantage d’un réseau de neurones est qu’il permet de s’adapter à des situations inédites. Contrairement à un système de règles de sécurité fixes, nous allons utiliser cette nouvelle technologie afin de pouvoir identifier des types de comportements malveillants et de pouvoir le généraliser à des programmes malveillants futurs.L’objectif de cette recherche consiste à proposer un modèle de détection hybride de programmes malveillants sur Android basé sur deux réseaux de neurones de classification entraînés par des ensembles de caractéristiques statiques et dynamiques. Nous avons tout d’abord procédé à une revue de littérature afin de connaître les techniques de détection existantes. Cette étude n’est pas exhaustive, mais permet de cerner les principaux enjeux rencontrés ainsi que les solutions proposées par la communauté scientifique. Ces dernières peuvent se répartir en deux groupes; les méthodes statiques consistent à examiner le code de l’application mobile, tandis que les méthodes dynamiques analysent le comportement d’une application lorsque cette dernière est exécutée roule sur un terminal mobile. Notre but est d’utiliser ces deux méthodes afin de profiter de leurs avantages respectifs. Pour ce faire, nous avons choisi d’utiliser la base de données hybride Omnidroid composée de 25,999 caractéristiques statiques et de 5,932 caractéristiques dynamiques. Nous montrons lors de nos travaux que 22,636 caractéristiques statiques ainsi que 2,210 caractéristiques dynamiques de la base de données d’Omnidroid sont vides. Nous menons également un plan d’expérience composé de centaines d’entraînements afin de régler les valeurs des hyperparamètres améliorant l’apprentissage sur ce jeu de données ainsi que pour sélectionner les caractéristiques restantes les plus pertinentes.----------ABSTRACT:A malware detection model for mobile devices contributes to the field of computer security. Cybersecurity is a major current problem mainly motivated by the growing number of cyber attacks. Indeed, data loss due to computer breaches cost Canada $45 billion in 2018. In addition, ethical problems also arise if personal information of customers and users is disclosed. Due to the popularity of smartphones and tablets, mobile devices are becoming the target of cyberattacks. It is therefore essential to explore new ways to prevent, detect and counter cyberattacks. In these detection mechanisms, machine learning is used to create classifiers that determine whether an application is dangerous or not. The advantage of a neural network is that it allows you to adapt to new situations. Unlike a system of fixed security rules, we will use this new technology in order to be able to identify types of malicious behavior and to be able to generalize it to future malicious programs. The goal of this research is to propose a hybrid malware detection model on Android itself based on two classification neural networks driven by sets of static and dynamic features. We first conducted a literature review to find out about existing detection techniques. These can be divided into two groups; static methods consist of examining the code of the mobile application while dynamic methods analyze the behavior of an application when it is running on a mobile terminal. Our goal is to use these two methods to take advantage of their respective advantages. To do this, we chose to use the hybrid database “Omnidroid” composed of 25,999 static features and 5,932 dynamic features. We show that 22,636 static features as well as 2,210 dynamic features of the Omnidroid database are empty. We are also carrying out an experiment plan composed of hundreds of trainings in order to adjust the values of the hyperparameters improving the learning on this dataset as well as to select the most relevant remaining features
    corecore