885 research outputs found

    R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections

    Full text link
    The influence of Deep Learning on image identification and natural language processing has attracted enormous attention globally. The convolution neural network that can learn without prior extraction of features fits well in response to the rapid iteration of Android malware. The traditional solution for detecting Android malware requires continuous learning through pre-extracted features to maintain high performance of identifying the malware. In order to reduce the manpower of feature engineering prior to the condition of not to extract pre-selected features, we have developed a coloR-inspired convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2) system. The system can convert the bytecode of classes.dex from Android archive file to rgb color code and store it as a color image with fixed size. The color image is input to the convolutional neural network for automatic feature extraction and training. The data was collected from Jan. 2017 to Aug 2017. During the period of time, we have collected approximately 2 million of benign and malicious Android apps for our experiments with the help from our research partner Leopard Mobile Inc. Our experiment results demonstrate that the proposed system has accurate security analysis on contracts. Furthermore, we keep our research results and experiment materials on http://R2D2.TWMAN.ORG.Comment: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13, 2018. (Accepted

    Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection

    Full text link
    Machine learning based solutions have been successfully employed for automatic detection of malware in Android applications. However, machine learning models are known to lack robustness against inputs crafted by an adversary. So far, the adversarial examples can only deceive Android malware detectors that rely on syntactic features, and the perturbations can only be implemented by simply modifying Android manifest. While recent Android malware detectors rely more on semantic features from Dalvik bytecode rather than manifest, existing attacking/defending methods are no longer effective. In this paper, we introduce a new highly-effective attack that generates adversarial examples of Android malware and evades being detected by the current models. To this end, we propose a method of applying optimal perturbations onto Android APK using a substitute model. Based on the transferability concept, the perturbations that successfully deceive the substitute model are likely to deceive the original models as well. We develop an automated tool to generate the adversarial examples without human intervention to apply the attacks. In contrast to existing works, the adversarial examples crafted by our method can also deceive recent machine learning based detectors that rely on semantic features such as control-flow-graph. The perturbations can also be implemented directly onto APK's Dalvik bytecode rather than Android manifest to evade from recent detectors. We evaluated the proposed manipulation methods for adversarial examples by using the same datasets that Drebin and MaMadroid (5879 malware samples) used. Our results show that, the malware detection rates decreased from 96% to 1% in MaMaDroid, and from 97% to 1% in Drebin, with just a small distortion generated by our adversarial examples manipulation method.Comment: 15 pages, 11 figure

    A Pre-Trained BERT Model for Android Applications

    Full text link
    The automation of an increasingly large number of software engineering tasks is becoming possible thanks to Machine Learning (ML). One foundational building block in the application of ML to software artifacts is the representation of these artifacts (e.g., source code or executable code) into a form that is suitable for learning. Many studies have leveraged representation learning, delegating to ML itself the job of automatically devising suitable representations. Yet, in the context of Android problems, existing models are either limited to coarse-grained whole-app level (e.g., apk2vec) or conducted for one specific downstream task (e.g., smali2vec). Our work is part of a new line of research that investigates effective, task-agnostic, and fine-grained universal representations of bytecode to mitigate both of these two limitations. Such representations aim to capture information relevant to various low-level downstream tasks (e.g., at the class-level). We are inspired by the field of Natural Language Processing, where the problem of universal representation was addressed by building Universal Language Models, such as BERT, whose goal is to capture abstract semantic information about sentences, in a way that is reusable for a variety of tasks. We propose DexBERT, a BERT-like Language Model dedicated to representing chunks of DEX bytecode, the main binary format used in Android applications. We empirically assess whether DexBERT is able to model the DEX language and evaluate the suitability of our model in two distinct class-level software engineering tasks: Malicious Code Localization and Defect Prediction. We also experiment with strategies to deal with the problem of catering to apps having vastly different sizes, and we demonstrate one example of using our technique to investigate what information is relevant to a given task

    A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization

    Full text link
    Existing Android malware detection approaches use a variety of features such as security sensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view) of apps' behaviours with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterise several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevent them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localisation. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder. MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localisation experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall

    EavesDroid: Eavesdropping User Behaviors via OS Side-Channels on Smartphones

    Full text link
    As the Internet of Things (IoT) continues to evolve, smartphones have become essential components of IoT systems. However, with the increasing amount of personal information stored on smartphones, user privacy is at risk of being compromised by malicious attackers. Although malware detection engines are commonly installed on smartphones against these attacks, attacks that can evade these defenses may still emerge. In this paper, we analyze the return values of system calls on Android smartphones and find two never-disclosed vulnerable return values that can leak fine-grained user behaviors. Based on this observation, we present EavesDroid, an application-embedded side-channel attack on Android smartphones that allows unprivileged attackers to accurately identify fine-grained user behaviors (e.g., viewing messages and playing videos) via on-screen operations. Our attack relies on the correlation between user behaviors and the return values associated with hardware and system resources. While this attack is challenging since these return values are susceptible to fluctuation and misalignment caused by many factors, we show that attackers can eavesdrop on fine-grained user behaviors using a CNN-GRU classification model that adopts min-max normalization and multiple return value fusion. Our experiments on different models and versions of Android smartphones demonstrate that EavesDroid can achieve 98% and 86% inference accuracy for 17 classes of user behaviors in the test set and real-world settings, highlighting the risk of our attack on user privacy. Finally, we recommend effective malware detection, carefully designed obfuscation methods, or restrictions on reading vulnerable return values to mitigate this attack.Comment: 15 pages, 25 figure
    corecore