2,008 research outputs found
Intelligent Malware Detection Using File-to-file Relations and Enhancing its Security against Adversarial Attacks
With computing devices and the Internet being indispensable in people\u27s everyday life, malware has posed serious threats to their security, making its detection of utmost concern. To protect legitimate users from the evolving malware attacks, machine learning-based systems have been successfully deployed and offer unparalleled flexibility in automatic malware detection. In most of these systems, resting on the analysis of different content-based features either statically or dynamically extracted from the file samples, various kinds of classifiers are constructed to detect malware. However, besides content-based features, file-to-file relations, such as file co-existence, can provide valuable information in malware detection and make evasion harder. To better understand the properties of file-to-file relations, we construct the file co-existence graph. Resting on the constructed graph, we investigate the semantic relatedness among files, and leverage graph inference, active learning and graph representation learning for malware detection. Comprehensive experimental results on the real sample collections from Comodo Cloud Security Center demonstrate the effectiveness of our proposed learning paradigms.
As machine learning-based detection systems become more widely deployed, the incentive for defeating them increases. Therefore, we go further insight into the arms race between adversarial malware attack and defense, and aim to enhance the security of machine learning-based malware detection systems. In particular, we first explore the adversarial attacks under different scenarios (i.e., different levels of knowledge the attackers might have about the targeted learning system), and define a general attack strategy to thoroughly assess the adversarial behaviors. Then, considering different skills and capabilities of the attackers, we propose the corresponding secure-learning paradigms to counter the adversarial attacks and enhance the security of the learning systems while not compromising the detection accuracy. We conduct a series of comprehensive experimental studies based on the real sample collections from Comodo Cloud Security Center and the promising results demonstrate the effectiveness of our proposed secure-learning models, which can be readily applied to other detection tasks
Survey of Machine Learning Techniques for Malware Analysis
Coping with malware is getting more and more challenging, given their
relentless growth in complexity and volume. One of the most common approaches
in literature is using machine learning techniques, to automatically learn
models and patterns behind such complexity, and to develop technologies for
keeping pace with the speed of development of novel malware. This survey aims
at providing an overview on the way machine learning has been used so far in
the context of malware analysis. We systematize surveyed papers according to
their objectives (i.e., the expected output, what the analysis aims to), what
information about malware they specifically use (i.e., the features), and what
machine learning techniques they employ (i.e., what algorithm is used to
process the input and produce the output). We also outline a number of problems
concerning the datasets used in considered works, and finally introduce the
novel concept of malware analysis economics, regarding the study of existing
tradeoffs among key metrics, such as analysis accuracy and economical costs
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation
Many security and privacy problems can be modeled as a graph classification
problem, where nodes in the graph are classified by collective classification
simultaneously. State-of-the-art collective classification methods for such
graph-based security and privacy analytics follow the following paradigm:
assign weights to edges of the graph, iteratively propagate reputation scores
of nodes among the weighted graph, and use the final reputation scores to
classify nodes in the graph. The key challenge is to assign edge weights such
that an edge has a large weight if the two corresponding nodes have the same
label, and a small weight otherwise. Although collective classification has
been studied and applied for security and privacy problems for more than a
decade, how to address this challenge is still an open question. In this work,
we propose a novel collective classification framework to address this
long-standing challenge. We first formulate learning edge weights as an
optimization problem, which quantifies the goals about the final reputation
scores that we aim to achieve. However, it is computationally hard to solve the
optimization problem because the final reputation scores depend on the edge
weights in a very complex way. To address the computational challenge, we
propose to jointly learn the edge weights and propagate the reputation scores,
which is essentially an approximate solution to the optimization problem. We
compare our framework with state-of-the-art methods for graph-based security
and privacy analytics using four large-scale real-world datasets from various
application scenarios such as Sybil detection in social networks, fake review
detection in Yelp, and attribute inference attacks. Our results demonstrate
that our framework achieves higher accuracies than state-of-the-art methods
with an acceptable computational overhead.Comment: Network and Distributed System Security Symposium (NDSS), 2019.
Dataset link: http://gonglab.pratt.duke.edu/code-dat
A Review on Malware Analysis by using an Approach of Machine Learning Techniques
In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious andevolving security threats to Internet users. To protect legitimate users from these threats, anti-malware softwareproducts from different companies, including Comodo, Kaspersky, Kingsoft, and Symantec, provide the majordefense against malware. Unfortunately, driven by the economic benefits, the number of new malware sampleshas explosively increased: anti-malware vendors are now confronted with millions of potential malware samplesper year. In order to keep on combating the increase in malware samples, there is an urgent need to developintelligent methods for effective and efficient malware detection from the real and large daily sample collection.One of the most common approaches in literature is using machine learning techniques, to automatically learnmodels and patterns behind such complexity, and to develop technologies to keep pace with malware evolution.This survey aims at providing an overview on the way machine learning has been used so far in the context ofmalware analysis in Windows environments. This paper gives an survey on the features related to malware filesor documents and what machine learning techniques they employ (i.e., what algorithm is used to process the inputand produce the output). Different issues and challenges are also discussed
R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections
The influence of Deep Learning on image identification and natural language
processing has attracted enormous attention globally. The convolution neural
network that can learn without prior extraction of features fits well in
response to the rapid iteration of Android malware. The traditional solution
for detecting Android malware requires continuous learning through
pre-extracted features to maintain high performance of identifying the malware.
In order to reduce the manpower of feature engineering prior to the condition
of not to extract pre-selected features, we have developed a coloR-inspired
convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2)
system. The system can convert the bytecode of classes.dex from Android archive
file to rgb color code and store it as a color image with fixed size. The color
image is input to the convolutional neural network for automatic feature
extraction and training. The data was collected from Jan. 2017 to Aug 2017.
During the period of time, we have collected approximately 2 million of benign
and malicious Android apps for our experiments with the help from our research
partner Leopard Mobile Inc. Our experiment results demonstrate that the
proposed system has accurate security analysis on contracts. Furthermore, we
keep our research results and experiment materials on http://R2D2.TWMAN.ORG.Comment: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13,
2018. (Accepted
- …