912 research outputs found
Semi-supervised classification for dynamic Android malware detection
A growing number of threats to Android phones creates challenges for malware
detection. Manually labeling the samples into benign or different malicious
families requires tremendous human efforts, while it is comparably easy and
cheap to obtain a large amount of unlabeled APKs from various sources.
Moreover, the fast-paced evolution of Android malware continuously generates
derivative malware families. These families often contain new signatures, which
can escape detection when using static analysis. These practical challenges can
also cause traditional supervised machine learning algorithms to degrade in
performance.
In this paper, we propose a framework that uses model-based semi-supervised
(MBSS) classification scheme on the dynamic Android API call logs. The
semi-supervised approach efficiently uses the labeled and unlabeled APKs to
estimate a finite mixture model of Gaussian distributions via conditional
expectation-maximization and efficiently detects malwares during out-of-sample
testing. We compare MBSS with the popular malware detection classifiers such as
support vector machine (SVM), -nearest neighbor (kNN) and linear
discriminant analysis (LDA). Under the ideal classification setting, MBSS has
competitive performance with 98\% accuracy and very low false positive rate for
in-sample classification. For out-of-sample testing, the out-of-sample test
data exhibit similar behavior of retrieving phone information and sending to
the network, compared with in-sample training set. When this similarity is
strong, MBSS and SVM with linear kernel maintain 90\% detection rate while
NN and LDA suffer great performance degradation. When this similarity is
slightly weaker, all classifiers degrade in performance, but MBSS still
performs significantly better than other classifiers
Automated Poisoning Attacks and Defenses in Malware Detection Systems: An Adversarial Machine Learning Approach
The evolution of mobile malware poses a serious threat to smartphone
security. Today, sophisticated attackers can adapt by maximally sabotaging
machine-learning classifiers via polluting training data, rendering most recent
machine learning-based malware detection tools (such as Drebin, DroidAPIMiner,
and MaMaDroid) ineffective. In this paper, we explore the feasibility of
constructing crafted malware samples; examine how machine-learning classifiers
can be misled under three different threat models; then conclude that injecting
carefully crafted data into training data can significantly reduce detection
accuracy. To tackle the problem, we propose KuafuDet, a two-phase learning
enhancing approach that learns mobile malware by adversarial detection.
KuafuDet includes an offline training phase that selects and extracts features
from the training set, and an online detection phase that utilizes the
classifier trained by the first phase. To further address the adversarial
environment, these two phases are intertwined through a self-adaptive learning
scheme, wherein an automated camouflage detector is introduced to filter the
suspicious false negatives and feed them back into the training phase. We
finally show that KuafuDet can significantly reduce false negatives and boost
the detection accuracy by at least 15%. Experiments on more than 250,000 mobile
applications demonstrate that KuafuDet is scalable and can be highly effective
as a standalone system
Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection
Machine learning based solutions have been successfully employed for
automatic detection of malware in Android applications. However, machine
learning models are known to lack robustness against inputs crafted by an
adversary. So far, the adversarial examples can only deceive Android malware
detectors that rely on syntactic features, and the perturbations can only be
implemented by simply modifying Android manifest. While recent Android malware
detectors rely more on semantic features from Dalvik bytecode rather than
manifest, existing attacking/defending methods are no longer effective. In this
paper, we introduce a new highly-effective attack that generates adversarial
examples of Android malware and evades being detected by the current models. To
this end, we propose a method of applying optimal perturbations onto Android
APK using a substitute model. Based on the transferability concept, the
perturbations that successfully deceive the substitute model are likely to
deceive the original models as well. We develop an automated tool to generate
the adversarial examples without human intervention to apply the attacks. In
contrast to existing works, the adversarial examples crafted by our method can
also deceive recent machine learning based detectors that rely on semantic
features such as control-flow-graph. The perturbations can also be implemented
directly onto APK's Dalvik bytecode rather than Android manifest to evade from
recent detectors. We evaluated the proposed manipulation methods for
adversarial examples by using the same datasets that Drebin and MaMadroid (5879
malware samples) used. Our results show that, the malware detection rates
decreased from 96% to 1% in MaMaDroid, and from 97% to 1% in Drebin, with just
a small distortion generated by our adversarial examples manipulation method.Comment: 15 pages, 11 figure
Using Deep Neural Network for Android Malware Detection
The pervasiveness of the Android operating system, with the availability of
applications almost for everything, is readily accessible in the official
Google play store or a dozen alternative third-party markets. Additionally, the
vital role of smartphones in modern life leads to store significant information
on devices, not only personal information but also corporate information, which
attract malware developers to develop applications that can infiltrate user's
devices to steal information and perform harmful tasks. This accompanied with
the limitation of currently defenses techniques such as ineffective screening
in Google play store, weak or no screening in third-party markets. Antiviruses
software that still relies on a signature-based database that is effective only
in identifying known malware. To contrive with malicious applications that are
increased in volume and sophistication, we propose an Android malware detection
system that applies deep learning technique to face the threats of Android
malware. Extensive experiments on a real-world dataset contain benign and
malicious applications uncovered that the proposed system reaches an accuracy
of 95.31%.Comment: 9 pages, 5 figures, 6 Table
Android Malware Characterization using Metadata and Machine Learning Techniques
Android Malware has emerged as a consequence of the increasing popularity of
smartphones and tablets. While most previous work focuses on inherent
characteristics of Android apps to detect malware, this study analyses indirect
features and meta-data to identify patterns in malware applications. Our
experiments show that: (1) the permissions used by an application offer only
moderate performance results; (2) other features publicly available at Android
Markets are more relevant in detecting malware, such as the application
developer and certificate issuer, and (3) compact and efficient classifiers can
be constructed for the early detection of malware applications prior to code
inspection or sandboxing.Comment: 4 figures, 2 tables and 8 page
Android Malware Detection based on Factorization Machine
As the popularity of Android smart phones has increased in recent years, so
too has the number of malicious applications. Due to the potential for data
theft mobile phone users face, the detection of malware on Android devices has
become an increasingly important issue in cyber security. Traditional methods
like signature-based routines are unable to protect users from the
ever-increasing sophistication and rapid behavior changes in new types of
Android malware. Therefore, a great deal of effort has been made recently to
use machine learning models and methods to characterize and generalize the
malicious behavior patterns of mobile apps for malware detection.
In this paper, we propose a novel and highly reliable classifier for Android
Malware detection based on a Factorization Machine architecture and the
extraction of Android app features from manifest files and source code. Our
results indicate that the numerical feature representation of an app typically
results in a long and highly sparse vector and that the interactions among
different features are critical to revealing malicious behavior patterns. After
performing an extensive performance evaluation, our proposed method achieved a
test result of 100.00% precision score on the DREBIN dataset and 99.22%
precision score with only 1.10% false positive rate on the AMD dataset. These
metrics match the performance of state-of-the-art machine-learning-based
Android malware detection methods and several commercial antivirus engines with
the benefit of training up to 50 times faster
Android Malware Family Classification Based on Resource Consumption over Time
The vast majority of today's mobile malware targets Android devices. This has
pushed the research effort in Android malware analysis in the last years. An
important task of malware analysis is the classification of malware samples
into known families. Static malware analysis is known to fall short against
techniques that change static characteristics of the malware (e.g. code
obfuscation), while dynamic analysis has proven effective against such
techniques. To the best of our knowledge, the most notable work on Android
malware family classification purely based on dynamic analysis is DroidScribe.
With respect to DroidScribe, our approach is easier to reproduce. Our
methodology only employs publicly available tools, does not require any
modification to the emulated environment or Android OS, and can collect data
from physical devices. The latter is a key factor, since modern mobile malware
can detect the emulated environment and hide their malicious behavior. Our
approach relies on resource consumption metrics available from the proc file
system. Features are extracted through detrended fluctuation analysis and
correlation. Finally, a SVM is employed to classify malware into families. We
provide an experimental evaluation on malware samples from the Drebin dataset,
where we obtain a classification accuracy of 82%, proving that our methodology
achieves an accuracy comparable to that of DroidScribe. Furthermore, we make
the software we developed publicly available, to ease the reproducibility of
our results.Comment: Extended Versio
NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls
As computing systems become increasingly advanced and as users increasingly
engage themselves in technology, security has never been a greater concern. In
malware detection, static analysis, the method of analyzing potentially
malicious files, has been the prominent approach. This approach, however,
quickly falls short as malicious programs become more advanced and adopt the
capabilities of obfuscating its binaries to execute the same malicious
functions, making static analysis extremely difficult for newer variants. The
approach assessed in this paper is a novel dynamic malware analysis method,
which may generalize better than static analysis to newer variants. Inspired by
recent successes in Natural Language Processing (NLP), widely used document
classification techniques were assessed in detecting malware by doing such
analysis on system calls, which contain useful information about the operation
of a program as requests that the program makes of the kernel. Features
considered are extracted from system call traces of benign and malicious
programs, and the task to classify these traces is treated as a binary document
classification task of system call traces. The system call traces were
processed to remove the parameters to only leave the system call function
names. The features were grouped into various n-grams and weighted with Term
Frequency-Inverse Document Frequency. This paper shows that Linear Support
Vector Machines (SVM) optimized by Stochastic Gradient Descent and the
traditional Coordinate Descent on the Wolfe Dual form of the SVM are effective
in this approach, achieving a highest of 96% accuracy with 95% recall score.
Additional contributions include the identification of significant system call
sequences that could be avenues for further research.Comment: 8 pages, Intel International Science and Engineering Fair Project -
SOFT006
The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis
In recent years, mobile devices (e.g., smartphones and tablets) have met an
increasing commercial success and have become a fundamental element of the
everyday life for billions of people all around the world. Mobile devices are
used not only for traditional communication activities (e.g., voice calls and
messages) but also for more advanced tasks made possible by an enormous amount
of multi-purpose applications (e.g., finance, gaming, and shopping). As a
result, those devices generate a significant network traffic (a consistent part
of the overall Internet traffic). For this reason, the research community has
been investigating security and privacy issues that are related to the network
traffic generated by mobile devices, which could be analyzed to obtain
information useful for a variety of goals (ranging from device security and
network optimization, to fine-grained user profiling).
In this paper, we review the works that contributed to the state of the art
of network traffic analysis targeting mobile devices. In particular, we present
a systematic classification of the works in the literature according to three
criteria: (i) the goal of the analysis; (ii) the point where the network
traffic is captured; and (iii) the targeted mobile platforms. In this survey,
we consider points of capturing such as Wi-Fi Access Points, software
simulation, and inside real mobile devices or emulators. For the surveyed
works, we review and compare analysis techniques, validation methods, and
achieved results. We also discuss possible countermeasures, challenges and
possible directions for future research on mobile traffic analysis and other
emerging domains (e.g., Internet of Things). We believe our survey will be a
reference work for researchers and practitioners in this research field.Comment: 55 page
Understanding the efficacy, reliability and resiliency of computer vision techniques for malware detection and future research directions
My research lies in the intersection of security and machine learning. This
overview summarizes one component of my research: combining computer vision
with malware exploit detection for enhanced security solutions. I will present
the perspectives of efficacy, reliability and resiliency to formulate threat
detection as computer vision problems and develop state-of-the-art image-based
malware classification. Representing malware binary as images provides a direct
visualization of data samples, reduces the efforts for feature extraction, and
consumes the whole binary for holistic structural analysis. Employing transfer
learning of deep neural networks effective for large scale image classification
to malware classification demonstrates superior classification efficacy
compared with classical machine learning algorithms. To enhance reliability of
these vision-based malware detectors, interpretation frameworks can be
constructed on the malware visual representations and useful for extracting
faithful explanation, so that security practitioners have confidence in the
model before deployment. In cyber-security applications, we should always
assume that a malware writer constantly modifies code to bypass detection.
Addressing the resiliency of the malware detectors is equivalently important as
efficacy and reliability. Via understanding the attack surfaces of machine
learning models used for malware detection, we can greatly improve the
robustness of the algorithms to combat malware adversaries in the wild. Finally
I will discuss future research directions worth pursuing in this research
community.Comment: Repor
- …