235 research outputs found

    A Study of Android Malware Detection Techniques and Machine Learning

    Get PDF
    Android OS is one of the widely used mobile Operating Systems. The number of malicious applications and adwares are increasing constantly on par with the number of mobile devices. A great number of commercial signature based tools are available on the market which prevent to an extent the penetration and distribution of malicious applications. Numerous researches have been conducted which claims that traditional signature based detection system work well up to certain level and malware authors use numerous techniques to evade these tools. So given this state of affairs, there is an increasing need for an alternative, really tough malware detection system to complement and rectify the signature based system. Recent substantial research focused on machine learning algorithms that analyze features from malicious application and use those features to classify and detect unknown malicious applications. This study summarizes the evolution of malware detection techniques based on machine learning algorithms focused on the Android OS

    Prescience:Probabilistic Guidance on the Retraining Conundrum for Malware Detection

    Get PDF
    Malware evolves perpetually and relies on increasingly sophisticatedattacks to supersede defense strategies. Datadrivenapproaches to malware detection run the risk of becomingrapidly antiquated. Keeping pace with malwarerequires models that are periodically enriched with freshknowledge, commonly known as retraining. In this work,we propose the use of Venn-Abers predictors for assessingthe quality of binary classification tasks as a first step towardsidentifying antiquated models. One of the key bene-fits behind the use of Venn-Abers predictors is that they areautomatically well calibrated and offer probabilistic guidanceon the identification of nonstationary populations ofmalware. Our framework is agnostic to the underlying classificationalgorithm and can then be used for building betterretraining strategies in the presence of concept drift. Resultsobtained over a timeline-based evaluation with about 90Ksamples show that our framework can identify when modelstend to become obsolete

    Analyzing Android Adware

    Get PDF
    Most Android smartphone apps are free; in order to generate revenue, the app developers embed ad libraries so that advertisements are displayed when the app is being used. Billions of dollars are lost annually due to ad fraud. In this research, we propose a machine learning based scheme to detect Android adware based on static and dynamic features. We collect static features from the manifest file, while dynamic features are obtained from network traffic. Using these features, we initially classify Android applications into broad categories (e.g., adware and benign) and then further classify each application into a more specific family. We employ a variety of machine learning techniques including neural networks, random forests, adaboost and support vector machines

    Android malware detection based on image-based features and machine learning techniques

    Get PDF
    Bakour, Khaled/0000-0003-3327-2822WOS:000545934700001In this paper, a malware classification model has been proposed for detecting malware samples in the Android environment. The proposed model is based on converting some files from the source of the Android applications into grayscale images. Some image-based local features and global features, including four different types of local features and three different types of global features, have been extracted from the constructed grayscale image datasets and used for training the proposed model. To the best of our knowledge, this type of features is used for the first time in the Android malware detection domain. Moreover, the bag of visual words algorithm has been used to construct one feature vector from the descriptors of the local feature extracted from each image. The extracted local and global features have been used for training multiple machine learning classifiers including Random forest, k-nearest neighbors, Decision Tree, Bagging, AdaBoost and Gradient Boost. The proposed method obtained a very high classification accuracy reached 98.75% with a typical computational time does not exceed 0.018 s for each sample. The results of the proposed model outperformed the results of all compared state-of-art models in term of both classification accuracy and computational time

    Explainable machine learning for malware detection on Android applications

    Get PDF
    The presence of malicious software (malware), for example, in Android applications (apps), has harmful or irreparable consequences to the user and/or the device. Despite the protections app stores provide to avoid malware, it keeps growing in sophistication and diffusion. In this paper, we explore the use of machine learning (ML) techniques to detect malware in Android apps. The focus is on the study of different data pre-processing, dimensionality reduction, and classification techniques, assessing the generalization ability of the learned models using public domain datasets and specifically developed apps. We find that the classifiers that achieve better performance for this task are support vector machines (SVM) and random forests (RF). We emphasize the use of feature selection (FS) techniques to reduce the data dimensionality and to identify the most relevant features in Android malware classification, leading to explainability on this task. Our approach can identify the most relevant features to classify an app as malware. Namely, we conclude that permissions play a prominent role in Android malware detection. The proposed approach reduces the data dimensionality while achieving high accuracy in identifying malware in Android apps.info:eu-repo/semantics/publishedVersio

    A Study of Android Malware Detection Techniques and Machine Learning

    Get PDF
    Abstract Android OS is one of the widely used mobile Operating Systems. The number of malicious applications and adwares are increasing constantly on par with the number of mobile devices. A great number of commercial signature based tools are available on the market which prevent to an extent the penetration and distribution of malicious applications. Numerous researches have been conducted which claims that traditional signature based detection system work well up to certain level and malware authors use numerous techniques to evade these tools. So given this state of affairs, there is an increasing need for an alternative, really tough malware detection system to complement and rectify the signature based system. Recent substantial research focused on machine learning algorithms that analyze features from malicious application and use those features to classify and detect unknown malicious applications. This study summarizes the evolution of malware detection techniques based on machine learning algorithms focused on the Android OS

    Feature Selection on Permissions, Intents and APIs for Android Malware Detection

    Get PDF
    Malicious applications pose an enormous security threat to mobile computing devices. Currently 85% of all smartphones run Android, Google’s open-source operating system, making that platform the primary threat vector for malware attacks. Android is a platform that hosts roughly 99% of known malware to date, and is the focus of most research efforts in mobile malware detection due to its open source nature. One of the main tools used in this effort is supervised machine learning. While a decade of work has made a lot of progress in detection accuracy, there is an obstacle that each stream of research is forced to overcome, feature selection, i.e., determining which attributes of Android are most effective as inputs into machine learning models. This dissertation aims to address that problem by providing the community with an exhaustive analysis of the three primary types of Android features used by researchers: Permissions, Intents and API Calls. The intent of the report is not to describe a best performing feature set or a best performing machine learning model, nor to explain why certain Permissions, Intents or API Calls get selected above others, but rather to provide a holistic methodology to help guide feature selection for Android malware detection. The experiments used eleven different feature selection techniques covering filter methods, wrapper methods and embedded methods. Each feature selection technique was applied to seven different datasets based on the seven combinations available of Permissions, Intents and API Calls. Each of those seven datasets are from a base set of 119k Android apps. All of the result sets were then validated against three different machine learning models, Random Forest, SVM and a Neural Net, to test applicability across algorithm type. The experiments show that using a combination of Permissions, Intents and API Calls produced higher accuracy than using any of those alone or in any other combination and that feature selection should be performed on the combined dataset, not by feature type and then combined. The data also shows that, in general, a feature set size of 200 or more attributes is required for optimal results. Finally, the feature selection methods Relief, Correlation-based Feature Selection (CFS) and Recursive Feature Elimination (RFE) using a Neural Net are not satisfactory approaches for Android malware detection work. Based on the proposed methodology and experiments, this research provided insights into feature selection – a significant but often overlooked issue in Android malware detection. We believe the results reported herein is an important step for effective feature evaluation and selection in assisting malware detection especially for datasets with a large number of features. The methodology also has the potential to be applied to similar malware detection tasks or even in broader domains such as pattern recognition

    Integrated information gain with extra tree algorithm for feature permission analysis in android malware classification

    Get PDF
    The rapid growth of free applications in the android market has led to the fast spread of malware apps since users store their sensitive personal information on their mobile devices when using those apps. The permission mechanism is designed as a security layer to protect the android operating system by restricting access to local resources of the system at installation time and run time for updated versions of the android operating system. Even though permissions provide a secure layer to users, they can be exploited by attackers to threaten user privacy. Consequently, exploring the patterns of those permissions becomes necessary to find the relevant permission features that contribute to classifying android apps. However, with the era of big data and the rapid explosion of malware along with many unnecessary requested permissions, it has become a challenge to recognize the patterns of permissions from these data due to the irrelevant and redundant features that affect the classification performance and increase the complexity cost overhead. Ensemble-based Extra Tree - Feature Selection (FS-EX) algorithm was proposed in this study to explore the permission patterns by selecting a minimal-sized subset of highly discriminant permission features capable of discriminating against malware samples from nonmalware samples. The integrated Information Gain with Ensemble-based Extra Tree - Feature Selection (FS-IGEX) algorithm is proposed to assign weight values to permission features instead of binary values to determine the impact of weighted attribute variables on the classification performance. The two proposed methods based on Ensemble Extra Tree Feature Selection were evaluated on five datasets with various sample sizes and feature space using nine machine learning classifiers. Comparison studies were carried out between FS-EX subsets and the dataset of Full Permission features (FP) and the two approaches of the FS-IGEX method - the Permission-Binary (PB) approach and the Permission-Weighted (PW) approach. The permissions with PB were represented with binary values, whereas permissions with PW were represented with weighted values. The results demonstrated that the approach with the FS-EX was promising in obtaining the most prominent permission features related to the class target and attaining the same or close classification results in terms of accuracy with the highest accuracy mean of 96%, as compared to the FP. In addition, the PW approach of the FS-IGEX method had highly influential weighted permission features that could classify apps as malware and non-malware with the highest accuracy mean of 93%, compared to the PB approach of the FS-IGEX method and the FP

    Malware threats and detection for industrial mobile-IoT networks

    Full text link
    Industrial IoT networks deploy heterogeneous IoT devices to meet a wide range of user requirements. These devices are usually pooled from private or public IoT cloud providers. A significant number of IoT cloud providers integrate smartphones to overcome the latency of IoT devices and low computational power problems. However, the integration of mobile devices with industrial IoT networks exposes the IoT devices to significant malware threats. Mobile malware is the highest threat to the security of IoT data, user\u27s personal information, identity, and corporate/financial information. This paper analyzes the efforts regarding malware threats aimed at the devices deployed in industrial mobile-IoT networks and related detection techniques. We considered static, dynamic, and hybrid detection analysis. In this performance analysis, we compared static, dynamic, and hybrid analyses on the basis of data set, feature extraction techniques, feature selection techniques, detection methods, and the accuracy achieved by these methods. Therefore, we identify suspicious API calls, system calls, and the permissions that are extracted and selected as features to detect mobile malware. This will assist application developers in the safe use of APIs when developing applications for industrial IoT networks
    corecore