311 research outputs found

    Android malware detection based on image-based features and machine learning techniques

    Get PDF
    Bakour, Khaled/0000-0003-3327-2822WOS:000545934700001In this paper, a malware classification model has been proposed for detecting malware samples in the Android environment. The proposed model is based on converting some files from the source of the Android applications into grayscale images. Some image-based local features and global features, including four different types of local features and three different types of global features, have been extracted from the constructed grayscale image datasets and used for training the proposed model. To the best of our knowledge, this type of features is used for the first time in the Android malware detection domain. Moreover, the bag of visual words algorithm has been used to construct one feature vector from the descriptors of the local feature extracted from each image. The extracted local and global features have been used for training multiple machine learning classifiers including Random forest, k-nearest neighbors, Decision Tree, Bagging, AdaBoost and Gradient Boost. The proposed method obtained a very high classification accuracy reached 98.75% with a typical computational time does not exceed 0.018 s for each sample. The results of the proposed model outperformed the results of all compared state-of-art models in term of both classification accuracy and computational time

    Detection of app collusion potential using logic programming

    Get PDF
    Mobile devices pose a particular security risk because they hold personal details (accounts, locations, contacts, photos) and have capabilities potentially exploitable for eavesdropping (cameras/microphone, wireless connections). The Android operating system is designed with a number of built-in security features such as application sandboxing and permission-based access control. Unfortunately, these restrictions can be bypassed, without the user noticing, by colluding apps whose combined permissions allow them to carry out attacks that neither app is able to execute by itself. While the possibility of app collusion was first warned in 2011, it has been unclear if collusion is used by malware in the wild due to a lack of suitable detection methods and tools. This paper describes how we found the first collusion in the wild. We also present a strategy for detecting collusions and its implementation in Prolog that allowed us to make this discovery. Our detection strategy is grounded in concise definitions of collusion and the concept of ASR (Access-Send-Receive) signatures. The methodology is supported by statistical evidence. Our approach scales and is applicable to inclusion into professional malware detection systems: we applied it to a set of more than 50,000 apps collected in the wild. Code samples of our tool as well as of the detected malware are available

    DroidDetectMW: A Hybrid Intelligent Model for Android Malware Detection

    Get PDF
    Malicious apps specifically aimed at the Android platform have increased in tandem with the proliferation of mobile devices. Malware is now so carefully written that it is difficult to detect. Due to the exponential growth in malware, manual methods of malware are increasingly ineffective. Although prior writers have proposed numerous high-quality approaches, static and dynamic assessments inherently necessitate intricate procedures. The obfuscation methods used by modern malware are incredibly complex and clever. As a result, it cannot be detected using only static malware analysis. As a result, this work presents a hybrid analysis approach, partially tailored for multiple-feature data, for identifying Android malware and classifying malware families to improve Android malware detection and classification. This paper offers a hybrid method that combines static and dynamic malware analysis to give a full view of the threat. Three distinct phases make up the framework proposed in this research. Normalization and feature extraction procedures are used in the first phase of pre-processing. Both static and dynamic features undergo feature selection in the second phase. Two feature selection strategies are proposed to choose the best subset of features to use for both static and dynamic features. The third phase involves applying a newly proposed detection model to classify android apps; this model uses a neural network optimized with an improved version of HHO. Application of binary and multi-class classification is used, with binary classification for benign and malware apps and multi-class classification for detecting malware categories and families. By utilizing the features gleaned from static and dynamic malware analysis, several machine-learning methods are used for malware classification. According to the results of the experiments, the hybrid approach improves the accuracy of detection and classification of Android malware compared to the scenario when considering static and dynamic information separately

    Explainable Malware Detection System Using Transformers-Based Transfer Learning and Multi-Model Visual Representation

    Get PDF
    Android has become the leading mobile ecosystem because of its accessibility and adaptability. It has also become the primary target of widespread malicious apps. This situation needs the immediate implementation of an effective malware detection system. In this study, an explainable malware detection system was proposed using transfer learning and malware visual features. For effective malware detection, our technique leverages both textual and visual features. First, a pre-trained model called the Bidirectional Encoder Representations from Transformers (BERT) model was designed to extract the trained textual features. Second, the malware-to-image conversion algorithm was proposed to transform the network byte streams into a visual representation. In addition, the FAST (Features from Accelerated Segment Test) extractor and BRIEF (Binary Robust Independent Elementary Features) descriptor were used to efficiently extract and mark important features. Third, the trained and texture features were combined and balanced using the Synthetic Minority Over-Sampling (SMOTE) method; then, the CNN network was used to mine the deep features. The balanced features were then input into the ensemble model for efficient malware classification and detection. The proposed method was analyzed extensively using two public datasets, CICMalDroid 2020 and CIC-InvesAndMal2019. To explain and validate the proposed methodology, an interpretable artificial intelligence (AI) experiment was conducted

    Familial Clustering For Weakly-labeled Android Malware Using Hybrid Representation Learning

    Full text link
    IEEE Labeling malware or malware clustering is important for identifying new security threats, triaging and building reference datasets. The state-of-the-art Android malware clustering approaches rely heavily on the raw labels from commercial AntiVirus (AV) vendors, which causes misclustering for a substantial number of weakly-labeled malware due to the inconsistent, incomplete and overly generic labels reported by these closed-source AV engines, whose capabilities vary greatly and whose internal mechanisms are opaque (i.e., intermediate detection results are unavailable for clustering). The raw labels are thus often used as the only important source of information for clustering. To address the limitations of the existing approaches, this paper presents ANDRE, a new ANDroid Hybrid REpresentation Learning approach to clustering weakly-labeled Android malware by preserving heterogeneous information from multiple sources (including the results of static code analysis, the metainformation of an app, and the raw-labels of the AV vendors) to jointly learn a hybrid representation for accurate clustering. The learned representation is then fed into our outlieraware clustering to partition the weakly-labeled malware into known and unknown families. The malware whose malicious behaviours are close to those of the existing families on the network, are further classified using a three-layer Deep Neural Network (DNN). The unknown malware are clustered using a standard density-based clustering algorithm. We have evaluated our approach using 5,416 ground-truth malware from Drebin and 9,000 malware from VIRUSSHARE (uploaded between Mar. 2017 and Feb. 2018), consisting of 3324 weakly-labeled malware. The evaluation shows that ANDRE effectively clusters weaklylabeled malware which cannot be clustered by the state-of-theart approaches, while achieving comparable accuracy with those approaches for clustering ground-truth samples

    Malware detection based on dynamic analysis features

    Get PDF
    The widespread usage of mobile devices and their seamless adaptation to each users' needs by the means of useful applications (Apps), makes them a prime target for malware developers to get access to sensitive user data, such as banking details, or to hold data hostage and block user access. These apps are distributed in marketplaces that host millions and therefore have their own forms of automated malware detection in place in order to deter malware developers and keep their app store (and reputation) trustworthy, but there are still a number of apps that are able to bypass these detectors and remain available in the marketplace for any user to download. Current malware detection strategies rely mostly on using features extracted statically, dynamically or a conjunction of both, and making them suitable for machine learning applications, in order to scale detection to cover the number of apps that are submited to the marketplace. In this article, the main focus is the study of the effectiveness of these automated malware detection methods and their ability to keep up with the proliferation of new malware and its ever-shifting trends. By analising the performance of ML algorithms trained, with real world data, on diferent time periods and time scales with features extracted statically, dynamically and from user-feedback, we are able to identify the optimal setup to maximise malware detection.O uso generalizado de dispositivos móveis e sua adaptação perfeita às necessidades de cada utilizador por meio de aplicativos úteis (Apps) tornam-os um alvo principal para que criadores de malware obtenham acesso a dados confidenciais do usuário, como detalhes bancários, ou para reter dados e bloquear o acesso do utilizador. Estas apps são distribuídas em mercados que alojam milhões, e portanto, têm as suas próprias formas de detecção automatizada de malware, a fim de dissuadir os desenvolvedores de malware e manter sua loja de apps (e reputação) confiável, mas ainda existem várias apps capazes de ignorar esses detectores e permanecerem disponíveis no mercado para qualquer utilizador fazer o download. As estratégias atuais de detecção de malware dependem principalmente do uso de recursos extraídos estaticamente, dinamicamente ou de uma conjunção de ambos, e de torná-los adequados para aplicações de aprendizagem automática, a fim de dimensionar a detecção para cobrir o número de apps que são enviadas ao mercado. Neste artigo, o foco principal é o estudo da eficácia dos métodos automáticos de detecção de malware e as suas capacidades de acompanhar a popularidade de novo malware, bem como as suas tendências em constante mudança. Analisando o desempenho de algoritmos de ML treinados, com dados do mundo real, em diferentes períodos e escalas de tempo com recursos extraídos estaticamente, dinamicamente e com feedback do utilizador, é possível identificar a configuração ideal para maximizar a detecção de malware
    corecore