539 research outputs found

    Assessing Modality Selection Heuristics to Improve Multimodal Deep Learning for Malware Detection

    Get PDF
    With the growing use of Android devices, security threats are also increasing. While there are some existing malware detection methods, cybercriminals continue to develop ways to evade these security mechanisms. Thus, malware detection systems also need to evolve to meet this challenge. This work is a step towards achieving that goal. Malware detection methods need as much information as possible about the potential malware, and a multimodal approach can help in this regard by combining different aspects of an Android application. Using multimodal deep learning, it is possible to automatically learn a hierarchical representation for each modality and to give more weights to the more reliable modalities. Multiple modalities can improve classification by providing complementary information, however, the use of all available modalities does not necessarily maximize performance. Multimodal machine learning could benefit from a mechanism to guide the selection of modalities to include in a multimodal model. This work uses a malware detection problem to compare multiple heuristics for this selection process. We have used three different heuristics approaches for selecting the modalities at each step - the maxDifference heuristic, the maxSimilarity heuristic, and the maxAccuracy heuristic. Our experiments show that selecting modalities with low predictive correlation works better than the other examined heuristics. Our result suggest we do not need to combine highly accurate unimodal models, but rather we need models that make different kinds of errors. This method is designed to improve the stability and accuracy of our malware detection algorithms while reducing the overall cost

    apk2vec: Semi-supervised multi-view representation learning for profiling Android applications

    Full text link
    Building behavior profiles of Android applications (apps) with holistic, rich and multi-view information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semi-supervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for largescale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semi-supervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semi-supervised multi-view hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec's app profiles could significantly outperform state-of-the-art techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation.Comment: International Conference on Data Mining, 201

    Multimodal Approach for Malware Detection

    Get PDF
    Although malware detection is a very active area of research, few works were focused on using physical properties (e.g., power consumption) and multimodal features for malware detection. We designed an experimental testbed that allowed us to run samples of malware and non-malicious software applications and to collect power consumption, network traffic, and system logs data, and subsequently to extract dynamic behavioral-based features. We also extracted code-based static features of both malware and non-malicious software applications. These features were used for malware detection based on: feature level fusion using power consumption and network traffic data, feature level fusion using network traffic data and system logs, and multimodal feature level and decision level fusion. The contributions when using feature level fusion of power consumption and network traffic data are: (1) We focused on detecting real malware using the extracted dynamic behavioral features (both power-based and network traffic-based) and supervised machine learning algorithms, which has not been done by any of the prior works. (2) We ran a large number of machine learning experiments, which allowed us to identify the best performing learner, DC voltage rails that led to the best malware detection performance, and the subset of features that are the best predictors for malware detection. (3) The comparison of malware detection performance was done using a comprehensive set of metrics that reflect different aspects of the quality of malware detection. In the case of the feature level fusion using network traffic data and system logs, the contributions are: (1) Most of the previous works that have used network flows-based features have done classification of the network traffic, while our focus was on classifying the software running in a machine as malware and non-malicious software using the extracted dynamic behavioral features. (2) We experimented with different sizes of the training set (i.e., 90%, 75%, 50%, and 25% of the data) and found that smaller training sets produced very good classification results. This aspect of our work has a practical value because the manual labeling of the training set is a tedious and time consuming process. In this dissertation we present a multimodal deep learning neural network that integrates different modalities (i.e., power consumption, system logs, network traffic, and code-based static data) using decision level fusion. We evaluated the performance of each modality individually, when using feature level fusion, and when using decision level fusion. The contributions of our multimodal approach are as follow: (1) Collecting data from different modalities allowed us to develop a multimodal approach to malware detection, which has not been widely explored by prior works. Even more, none of the previous works compared the performance of feature level fusion with decision level fusion, which is explored in this dissertation. (2) We proposed a multimodal decision level fusion malware detection approach using a deep neural network and compared its performance with the performance of feature level fusion approaches based on deep neural network and standard supervised machine learning algorithms (i.e., Random Forest, J48, JRip, PART, Naive Bayes, and SMO)

    Android Malware detection using predictive analytics.

    Get PDF
    The growth of android applications is causing a threat and a serious issue towards Android’s security. The number of malware targeting the Android operating system is increasing daily. As a result, in recent days the traditional ways that are being used to detect malware are not able to defend alone against the rapid development of hackers attacking techniques and novel malware. This capstone project focuses on using predictive analytics toward detecting malware from the network traffic. In this capstone project, we aim to train and test our data to find the best machine learning model with the highest accuracy of detecting malware in the network traffic. Through a variety of machine learning algorithms and models, we focused on 5 models starting with the logistic regression that was successfully able to predict malware by 67%. Moving to the decision tree that was effectively able to predict malware by 69% which was exactly equal to the random forest prediction ability. The AdaBoost came about 84% exactness, and KNN came with the highest anticipation of 86% between all the models. This shows us the advantage of adopting predictive analytics in malware detection within the traditional approaches to build a strong and defendable Android operating system against malware

    A Hybrid Approach for Android Malware Detection and Family Classification

    Get PDF
    With the increase in the popularity of mobile devices, malicious applications targeting Android platform have greatly increased. Malware is coded so prudently that it has become very complicated to identify. The increase in the large amount of malware every day has made the manual approaches inadequate for detecting the malware. Nowadays, a new malware is characterized by sophisticated and complex obfuscation techniques. Thus, the static malware analysis alone is not enough for detecting it. However, dynamic malware analysis is appropriate to tackle evasion techniques but incapable to investigate all the execution paths and also it is very time consuming. So, for better detection and classification of Android malware, we propose a hybrid approach which integrates the features obtained after performing static and dynamic malware analysis. This approach tackles the problem of analyzing, detecting and classifying the Android malware in a more efficient manner. In this paper, we have used a robust set of features from static and dynamic malware analysis for creating two datasets i.e. binary and multiclass (family) classification datasets. These are made publically available on GitHub and Kaggle with the aim to help researchers and anti-malware tool creators for enhancing or developing new techniques and tools for detecting and classifying Android malware. Various machine learning algorithms are employed to detect and classify malware using the features extracted after performing static and dynamic malware analysis. The experimental outcomes indicate that hybrid approach enhances the accuracy of detection and classification of Android malware as compared to the case when static and dynamic features are considered alone

    HMCMA: Design of an Efficient Model with Hybrid Machine Learning in Cyber security for Enhanced Detection of Malicious Activities

    Get PDF
    In the rapidly evolving landscape of cyber security, the incessant advancement of malicious activities presents a formidable challenge, necessitating a paradigm shift in detection methodologies. Traditional methods, primarily reliant on static rule-based systems, exhibit palpable limitations in grappling with the dynamic and sophisticated nature of modern cyber threats. This inadequacy underscores the urgent need for innovative approaches that can adeptly adapt and respond to the ever-changing threat environment. Addressing this exigency, the present research introduces a novel hybrid machine learning model, ingeniously crafted to transcend the constraints of existing malicious activity detection frameworks. The proposed model synergizes the strengths of diverse machine learning strategies, including anomaly detection techniques including Isolation Forest and One-Class SVM, and validates the results of these classifiers using Random Forest and Gradient Boosting operations. The validated malware instances are classified into malware types using fusion of Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTM) based Recurrent Neural Networks (RNNs) under real-time network configuration sets. This eclectic amalgamation not only leverages the unique capabilities of each algorithm but also harmonizes them to forge a more robust and precise detection mechanisms. The strategic integration of these algorithms facilitates a comprehensive analysis of network traffic and system logs, thereby significantly enhancing the detection accuracy. Furthermore, the model's adaptive learning component ensures its relevance and efficacy in the face of evolving cyber threats, a quintessential feature for contemporary cyber security solutions. Empirical evaluations, conducted using multiple malware datasets and samples, substantiate the model's superiority over existing methods. It exhibited a remarkable 10.4% improvement in precision, an 8.5% increase in accuracy, a 4.9% enhancement in recall, an 8.3% rise in AUC, a 4.5% boost in specificity, and a notable 2.5% reduction in detection delay. These compelling results underscore the model's potential in revolutionizing malicious activity detection, providing organizations with a more effective and resilient defense mechanism against a spectrum of cyber threats. The research culminates in a significant stride forward in cyber security, offering a robust, adaptive, and comprehensive solution that addresses the pressing need for advanced malicious activity detection, thereby bolstering the overall cyber security posture of organizations in the digital age sets

    Protecting Android Devices from Malware Attacks: A State-of-the-Art Report of Concepts, Modern Learning Models and Challenges

    Get PDF
    Advancements in microelectronics have increased the popularity of mobile devices like cellphones, tablets, e-readers, and PDAs. Android, with its open-source platform, broad device support, customizability, and integration with the Google ecosystem, has become the leading operating system for mobile devices. While Android's openness brings benefits, it has downsides like a lack of official support, fragmentation, complexity, and security risks if not maintained. Malware exploits these vulnerabilities for unauthorized actions and data theft. To enhance device security, static and dynamic analysis techniques can be employed. However, current attackers are becoming increasingly sophisticated, and they are employing packaging, code obfuscation, and encryption techniques to evade detection models. Researchers prefer flexible artificial intelligence methods, particularly deep learning models, for detecting and classifying malware on Android systems. In this survey study, a detailed literature review was conducted to investigate and analyze how deep learning approaches have been applied to malware detection on Android systems. The study also provides an overview of the Android architecture, datasets used for deep learning-based detection, and open issues that will be studied in the future

    A Hybrid Model for Android Malware Detection using Decision Tree and KNN

    Get PDF
    Malwares are becoming a major problem nowadays all around the world in android operating systems. The malware is a piece of software developed for harming or exploiting certain other hardware as well as software. The term Malware is also known as malicious software which is utilized to define Trojans, viruses, as well as other kinds of spyware. There have been developed many kinds of techniques for protecting the android operating systems from malware during the last decade. However, the existing techniques have numerous drawbacks such as accuracy to detect the type of malware in real-time in a quick manner for protecting the android operating systems. In this article, the authors developed a hybrid model for android malware detection using a decision tree and KNN (k-nearest neighbours) technique. First, Dalvik opcode, as well as real opcode, was pulled out by using the reverse procedure of the android software. Secondly, eigenvectors of sampling were produced by utilizing the n-gram model. Our suggested hybrid model efficiently combines KNN along with the decision tree for effective detection of the android malware in real-time. The outcome of the proposed scheme illustrates that the proposed hybrid model is better in terms of the accurate detection of any kind of malware from the Android operating system in a fast and accurate manner. In this experiment, 815 sample size was selected for the normal samples and the 3268-sample size was selected for the malicious samples. Our proposed hybrid model provides pragmatic values of the parameters namely precision, ACC along with the Recall, and F1 such as 0.93, 0.98, 0.96, and 0.99 along with 0.94, 0.99, 0.93, and 0.99 respectively. In the future, there are vital possibilities to carry out more research in this field to develop new methods for Android malware detection

    Classification and Analysis of Android Malware Images Using Feature Fusion Technique

    Get PDF
    The super packed functionalities and artificial intelligence (AI)-powered applications have made the Android operating system a big player in the market. Android smartphones have become an integral part of life and users are reliant on their smart devices for making calls, sending text messages, navigation, games, and financial transactions to name a few. This evolution of the smartphone community has opened new horizons for malware developers. As malware variants are growing at a tremendous rate every year, there is an urgent need to combat against stealth malware techniques. This paper proposes a visualization and machine learning-based framework for classifying Android malware. Android malware applications from the DREBIN dataset were converted into grayscale images. In the first phase of the experiment, the proposed framework transforms Android malware into fifteen different image sections and identifies malware files by exploiting handcrafted features associated with Android malware images. The algorithms such as Gray Level Co-occurrence Matrix-based (GLCM), Global Image deScripTors (GIST), and Local Binary Pattern (LBP) are used to extract the handcrafted features from the image sections. The extracted features were further classified using machine learning algorithms like K-Nearest Neighbors, Support Vector Machines, and Random Forests. In the second phase of the experiment, handcrafted features were fused with CNN features to form the feature fusion strategy. The classification performance was evaluated against every malware image file section. The results obtained using the Feature Fusion strategy are compared with handcrafted features results. The experiment results conclude to the fact that Feature Fusion-SVM model is most suited for the identification and classification of Android malware using the certificate and Android Manifest (CR + AM) malware images. It attained an high accuracy of 93.24%
    corecore