438 research outputs found
PhishDef: URL Names Say It All
Phishing is an increasingly sophisticated method to steal personal user
information using sites that pretend to be legitimate. In this paper, we take
the following steps to identify phishing URLs. First, we carefully select
lexical features of the URLs that are resistant to obfuscation techniques used
by attackers. Second, we evaluate the classification accuracy when using only
lexical features, both automatically and hand-selected, vs. when using
additional features. We show that lexical features are sufficient for all
practical purposes. Third, we thoroughly compare several classification
algorithms, and we propose to use an online method (AROW) that is able to
overcome noisy training data. Based on the insights gained from our analysis,
we propose PhishDef, a phishing detection system that uses only URL names and
combines the above three elements. PhishDef is a highly accurate method (when
compared to state-of-the-art approaches over real datasets), lightweight (thus
appropriate for online and client-side deployment), proactive (based on online
classification rather than blacklists), and resilient to training data
inaccuracies (thus enabling the use of large noisy training data).Comment: 9 pages, submitted to IEEE INFOCOM 201
R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections
The influence of Deep Learning on image identification and natural language
processing has attracted enormous attention globally. The convolution neural
network that can learn without prior extraction of features fits well in
response to the rapid iteration of Android malware. The traditional solution
for detecting Android malware requires continuous learning through
pre-extracted features to maintain high performance of identifying the malware.
In order to reduce the manpower of feature engineering prior to the condition
of not to extract pre-selected features, we have developed a coloR-inspired
convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2)
system. The system can convert the bytecode of classes.dex from Android archive
file to rgb color code and store it as a color image with fixed size. The color
image is input to the convolutional neural network for automatic feature
extraction and training. The data was collected from Jan. 2017 to Aug 2017.
During the period of time, we have collected approximately 2 million of benign
and malicious Android apps for our experiments with the help from our research
partner Leopard Mobile Inc. Our experiment results demonstrate that the
proposed system has accurate security analysis on contracts. Furthermore, we
keep our research results and experiment materials on http://R2D2.TWMAN.ORG.Comment: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13,
2018. (Accepted
A Study of Android Malware Detection Techniques and Machine Learning
Android OS is one of the widely used mobile Operating Systems. The number of malicious applications and adwares are increasing constantly on par with the number of mobile devices. A great number of commercial signature based tools are available on the market which prevent to an extent the penetration and distribution of malicious applications. Numerous researches have been conducted which claims that traditional signature based detection system work well up to certain level and malware authors use numerous techniques to evade these tools. So given this state of affairs, there is an increasing need for an alternative, really tough malware detection system to complement and rectify the signature based system. Recent substantial research focused on machine learning algorithms that analyze features from malicious application and use those features to classify and detect unknown malicious applications. This study summarizes the evolution of malware detection techniques based on machine learning algorithms focused on the Android OS
Improving malware detection with neuroevolution : a study with the semantic learning machine
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceMachine learning has become more attractive over the years due to its remarkable adaptation and
problem-solving abilities. Algorithms compete amongst each other to claim the best possible results
for every problem, being one of the most valued characteristics their generalization ability.
A recently proposed methodology of Genetic Programming (GP), called Geometric Semantic Genetic
Programming (GSGP), has seen its popularity rise over the last few years, achieving great results
compared to other state-of-the-art algorithms, due to its remarkable feature of inducing a fitness
landscape with no local optima solutions. To any supervised learning problem, where a metric is used
as an error function, GSGP’s landscape will be unimodal, therefore allowing for genetic algorithms to
behave much more efficiently and effectively.
Inspired by GSGP’s features, Gonçalves developed a new mutation operator to be applied to the Neural
Networks (NN) domain, creating the Semantic Learning Machine (SLM). Despite GSGP’s good results
already proven, there are still research opportunities for improvement, that need to be performed to
empirically prove GSGP as a state-of-the-art framework.
In this case, the study focused on applying SLM to NNs with multiple hidden layers and compare its
outputs to a very popular algorithm, Multilayer Perceptron (MLP), on a considerably large classification
dataset about Android malware. Findings proved that SLM, sharing common parametrization with
MLP, in order to have a fair comparison, is able to outperform it, with statistical significance
A federated approach to Android malware classification through Perm-Maps
In the last decades, mobile-based apps have been increasingly used in several application fields for many purposes involving a high number of human activities. Unfortunately, in addition to this, the number of cyber-attacks related to mobile platforms is increasing day-by-day. However, although advances in Artificial Intelligence science have allowed addressing many aspects of the problem, malware classification tasks are still challenging. For this reason, the following paper aims to propose new special features, called permission maps (Perm-Maps), which combine information related to the Android permissions and their corresponding severity levels. Such features have proven to be very effective in classifying different malware families through the usage of a convolutional neural network. Also, the advantages introduced by the Perm-Maps have been enhanced by a training process based on a federated logic. Experimental results show that the proposed approach achieves up to a 3% improvement in average accuracy with respect to J48 trees and Naive Bayes classifier, and up to 16% compared to multi-layer perceptron classifier. Furthermore, the combined use of Perm-Maps and federated logic allows dealing with unbalanced training datasets with low computational efforts
Applications in security and evasions in machine learning : a survey
In recent years, machine learning (ML) has become an important part to yield security and privacy in various applications. ML is used to address serious issues such as real-time attack detection, data leakage vulnerability assessments and many more. ML extensively supports the demanding requirements of the current scenario of security and privacy across a range of areas such as real-time decision-making, big data processing, reduced cycle time for learning, cost-efficiency and error-free processing. Therefore, in this paper, we review the state of the art approaches where ML is applicable more effectively to fulfill current real-world requirements in security. We examine different security applications' perspectives where ML models play an essential role and compare, with different possible dimensions, their accuracy results. By analyzing ML algorithms in security application it provides a blueprint for an interdisciplinary research area. Even with the use of current sophisticated technology and tools, attackers can evade the ML models by committing adversarial attacks. Therefore, requirements rise to assess the vulnerability in the ML models to cope up with the adversarial attacks at the time of development. Accordingly, as a supplement to this point, we also analyze the different types of adversarial attacks on the ML models. To give proper visualization of security properties, we have represented the threat model and defense strategies against adversarial attack methods. Moreover, we illustrate the adversarial attacks based on the attackers' knowledge about the model and addressed the point of the model at which possible attacks may be committed. Finally, we also investigate different types of properties of the adversarial attacks
- …