2,845 research outputs found
A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization
Existing Android malware detection approaches use a variety of features such
as security sensitive APIs, system calls, control-flow structures and
information flows in conjunction with Machine Learning classifiers to achieve
accurate detection. Each of these feature sets provides a unique semantic
perspective (or view) of apps' behaviours with inherent strengths and
limitations. Meaning, some views are more amenable to detect certain attacks
but may not be suitable to characterise several other attacks. Most of the
existing malware detection approaches use only one (or a selected few) of the
aforementioned feature sets which prevent them from detecting a vast majority
of attacks. Addressing this limitation, we propose MKLDroid, a unified
framework that systematically integrates multiple views of apps for performing
comprehensive malware detection and malicious code localisation. The rationale
is that, while a malware app can disguise itself in some views, disguising in
every view while maintaining malicious intent will be much harder.
MKLDroid uses a graph kernel to capture structural and contextual information
from apps' dependency graphs and identify malice code patterns in each view.
Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted
combination of the views which yields the best detection accuracy. Besides
multi-view learning, MKLDroid's unique and salient trait is its ability to
locate fine-grained malice code portions in dependency graphs (e.g.,
methods/classes). Through our large-scale experiments on several datasets
(incl. wild apps), we demonstrate that MKLDroid outperforms three
state-of-the-art techniques consistently, in terms of accuracy while
maintaining comparable efficiency. In our malicious code localisation
experiments on a dataset of repackaged malware, MKLDroid was able to identify
all the malice classes with 94% average recall
Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild
In this paper, we seek to better understand Android obfuscation and depict a
holistic view of the usage of obfuscation through a large-scale investigation
in the wild. In particular, we focus on four popular obfuscation approaches:
identifier renaming, string encryption, Java reflection, and packing. To obtain
the meaningful statistical results, we designed efficient and lightweight
detection models for each obfuscation technique and applied them to our massive
APK datasets (collected from Google Play, multiple third-party markets, and
malware databases). We have learned several interesting facts from the result.
For example, malware authors use string encryption more frequently, and more
apps on third-party markets than Google Play are packed. We are also interested
in the explanation of each finding. Therefore we carry out in-depth code
analysis on some Android apps after sampling. We believe our study will help
developers select the most suitable obfuscation approach, and in the meantime
help researchers improve code analysis systems in the right direction
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
Learning-based pattern classifiers, including deep networks, have shown
impressive performance in several application domains, ranging from computer
vision to cybersecurity. However, it has also been shown that adversarial input
perturbations carefully crafted either at training or at test time can easily
subvert their predictions. The vulnerability of machine learning to such wild
patterns (also referred to as adversarial examples), along with the design of
suitable countermeasures, have been investigated in the research field of
adversarial machine learning. In this work, we provide a thorough overview of
the evolution of this research area over the last ten years and beyond,
starting from pioneering, earlier work on the security of non-deep learning
algorithms up to more recent work aimed to understand the security properties
of deep learning algorithms, in the context of computer vision and
cybersecurity tasks. We report interesting connections between these
apparently-different lines of work, highlighting common misconceptions related
to the security evaluation of machine-learning algorithms. We review the main
threat models and attacks defined to this end, and discuss the main limitations
of current work, along with the corresponding future challenges towards the
design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
- …