3,278 research outputs found
Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild
In this paper, we seek to better understand Android obfuscation and depict a
holistic view of the usage of obfuscation through a large-scale investigation
in the wild. In particular, we focus on four popular obfuscation approaches:
identifier renaming, string encryption, Java reflection, and packing. To obtain
the meaningful statistical results, we designed efficient and lightweight
detection models for each obfuscation technique and applied them to our massive
APK datasets (collected from Google Play, multiple third-party markets, and
malware databases). We have learned several interesting facts from the result.
For example, malware authors use string encryption more frequently, and more
apps on third-party markets than Google Play are packed. We are also interested
in the explanation of each finding. Therefore we carry out in-depth code
analysis on some Android apps after sampling. We believe our study will help
developers select the most suitable obfuscation approach, and in the meantime
help researchers improve code analysis systems in the right direction
Mal-Netminer: Malware Classification Approach based on Social Network Analysis of System Call Graph
As the security landscape evolves over time, where thousands of species of
malicious codes are seen every day, antivirus vendors strive to detect and
classify malware families for efficient and effective responses against malware
campaigns. To enrich this effort, and by capitalizing on ideas from the social
network analysis domain, we build a tool that can help classify malware
families using features driven from the graph structure of their system calls.
To achieve that, we first construct a system call graph that consists of system
calls found in the execution of the individual malware families. To explore
distinguishing features of various malware species, we study social network
properties as applied to the call graph, including the degree distribution,
degree centrality, average distance, clustering coefficient, network density,
and component ratio. We utilize features driven from those properties to build
a classifier for malware families. Our experimental results show that
influence-based graph metrics such as the degree centrality are effective for
classifying malware, whereas the general structural metrics of malware are less
effective for classifying malware. Our experiments demonstrate that the
proposed system performs well in detecting and classifying malware families
within each malware class with accuracy greater than 96%.Comment: Mathematical Problems in Engineering, Vol 201
Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection
Machine learning based solutions have been successfully employed for
automatic detection of malware in Android applications. However, machine
learning models are known to lack robustness against inputs crafted by an
adversary. So far, the adversarial examples can only deceive Android malware
detectors that rely on syntactic features, and the perturbations can only be
implemented by simply modifying Android manifest. While recent Android malware
detectors rely more on semantic features from Dalvik bytecode rather than
manifest, existing attacking/defending methods are no longer effective. In this
paper, we introduce a new highly-effective attack that generates adversarial
examples of Android malware and evades being detected by the current models. To
this end, we propose a method of applying optimal perturbations onto Android
APK using a substitute model. Based on the transferability concept, the
perturbations that successfully deceive the substitute model are likely to
deceive the original models as well. We develop an automated tool to generate
the adversarial examples without human intervention to apply the attacks. In
contrast to existing works, the adversarial examples crafted by our method can
also deceive recent machine learning based detectors that rely on semantic
features such as control-flow-graph. The perturbations can also be implemented
directly onto APK's Dalvik bytecode rather than Android manifest to evade from
recent detectors. We evaluated the proposed manipulation methods for
adversarial examples by using the same datasets that Drebin and MaMadroid (5879
malware samples) used. Our results show that, the malware detection rates
decreased from 96% to 1% in MaMaDroid, and from 97% to 1% in Drebin, with just
a small distortion generated by our adversarial examples manipulation method.Comment: 15 pages, 11 figure
MaMaDroid: Detecting Android malware by building markov chains of behavioral models (extended version)
As Android has become increasingly popular, so has malware targeting it, thus motivating the research community
to propose different detection techniques. However, the constant evolution of the Android ecosystem,
and of malware itself, makes it hard to design robust tools that can operate for long periods of time without
the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from
a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MaMaDroid, a
static-analysis based system that abstracts app’s API calls to their class, package, or family, and builds a model
from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is
more resilient to API changes and the features set is of manageable size. We evaluate MaMaDroid using a
dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it effectively
detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time
(up to 0.87 F-measure two years after training). We also show that MaMaDroid remarkably overperforms
DroidAPIMiner, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to
assess whether MaMaDroid’s effectiveness mainly stems from the API abstraction or from the sequencing
modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls.
We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that
include API calls that are equally or more frequently used by benign apps
Analysis and evaluation of SafeDroid v2.0, a framework for detecting malicious Android applications
Android smartphones have become a vital component of the daily routine of millions of people, running a plethora of applications available in the official and alternative marketplaces. Although there are many security mechanisms to scan and filter malicious applications, malware is still able to reach the devices of many end-users. In this paper, we introduce the SafeDroid v2.0 framework, that is a flexible, robust, and versatile open-source solution for statically analysing Android applications, based on machine learning techniques. The main goal of our work, besides the automated production of fully sufficient prediction and classification models in terms of maximum accuracy scores and minimum negative errors, is to offer an out-of-the-box framework that can be employed by the Android security researchers to efficiently experiment to find effective solutions: the SafeDroid v2.0 framework makes it possible to test many different combinations of machine learning classifiers, with a high degree of freedom and flexibility in the choice of features to consider, such as dataset balance and dataset selection. The framework also provides a server, for generating experiment reports, and an Android application, for the verification of the produced models in real-life scenarios. An extensive campaign of experiments is also presented to show how it is possible to efficiently find competitive solutions: the results of our experiments confirm that SafeDroid v2.0 can reach very good performances, even with highly unbalanced dataset inputs and always with a very limited overhead
MaMaDroid: Detecting Android malware by building Markov chains of behavioral models (extended version)
As Android has become increasingly popular, so has malware targeting it, thus motivating the research community to propose different detection techniques. However, the constant evolution of the Android ecosystem, and of malware itself, makes it hard to design robust tools that can operate for long periods of time without the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MaMaDroid, a static-analysis-based system that abstracts app's API calls to their class, package, or family, and builds a model from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is more resilient to API changes and the features set is of manageable size. We evaluate MaMaDroid using a dataset of 8.5K benign and 35.5K malicious apps collected over a period of 6 years, showing that it effectively detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time (up to 0.87 F-measure 2 years after training). We also show that MaMaDroid remarkably overperforms DroidAPIMiner, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to assess whether MaMaDroid's effectiveness mainly stems from the API abstraction or from the sequencing modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls. We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that include API calls that are equally or more frequently used by benign apps
- …