582 research outputs found
High Accuracy Phishing Detection Based on Convolutional Neural Networks
The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this pa-per compares favourably to the state-of-the art in deep learning based phishing website detection
Artificial intelligence in the cyber domain: Offense and defense
Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41
An Evasion Attack against ML-based Phishing URL Detectors
Background: Over the year, Machine Learning Phishing URL classification
(MLPU) systems have gained tremendous popularity to detect phishing URLs
proactively. Despite this vogue, the security vulnerabilities of MLPUs remain
mostly unknown. Aim: To address this concern, we conduct a study to understand
the test time security vulnerabilities of the state-of-the-art MLPU systems,
aiming at providing guidelines for the future development of these systems.
Method: In this paper, we propose an evasion attack framework against MLPU
systems. To achieve this, we first develop an algorithm to generate adversarial
phishing URLs. We then reproduce 41 MLPU systems and record their baseline
performance. Finally, we simulate an evasion attack to evaluate these MLPU
systems against our generated adversarial URLs. Results: In comparison to
previous works, our attack is: (i) effective as it evades all the models with
an average success rate of 66% and 85% for famous (such as Netflix, Google) and
less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively;
(ii) realistic as it requires only 23ms to produce a new adversarial URL
variant that is available for registration with a median cost of only
$11.99/year. We also found that popular online services such as Google
SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that
Adversarial training (successful defence against evasion attack) does not
significantly improve the robustness of these systems as it decreases the
success rate of our attack by only 6% on average for all the models. (iv)
Further, we identify the security vulnerabilities of the considered MLPU
systems. Our findings lead to promising directions for future research.
Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but
also highlights implications for future study towards assessing and improving
these systems.Comment: Draft for ACM TOP
Phishing Webpage Classification via Deep Learning-Based Algorithms: An Empirical Study
This work was supported/funded by the Ministry of Higher Education under the Fundamental Research Grant Scheme (FRGS/1/2018/ICT04/UTM/01/1). The authors sincerely thank Universiti Teknologi Malaysia (UTM) under Research University Grant Vot-20H04, Malaysia Research University Network (MRUN) Vot 4L876, for the completion of the research. Faculty of Informatics and Management, University of Hradec Kralove, SPEV project Grant Number: 2102/2021.Phishing detection with high-performance accuracy and low computational complexity
has always been a topic of great interest. New technologies have been developed to improve the
phishing detection rate and reduce computational constraints in recent years. However, one solution
is insufficient to address all problems caused by attackers in cyberspace. Therefore, the primary
objective of this paper is to analyze the performance of various deep learning algorithms in detecting
phishing activities. This analysis will help organizations or individuals select and adopt the proper
solution according to their technological needs and specific applications’ requirements to fight
against phishing attacks. In this regard, an empirical study was conducted using four different deep
learning algorithms, including deep neural network (DNN), convolutional neural network (CNN),
Long Short-Term Memory (LSTM), and gated recurrent unit (GRU). To analyze the behaviors of
these deep learning architectures, extensive experiments were carried out to examine the impact of
parameter tuning on the performance accuracy of the deep learning models. In addition, various
performance metrics were measured to evaluate the effectiveness and feasibility of DL models in
detecting phishing activities. The results obtained from the experiments showed that no single DL
algorithm achieved the best measures across all performance metrics. The empirical findings from
this paper also manifest several issues and suggest future research directions related to deep learning
in the phishing detection domain.Ministry of Higher Education under the Fundamental Research Grant Scheme FRGS/1/2018/ICT04/UTM/01/1Universiti Teknologi Malaysia (UTM) Vot-20H04Malaysia Research University Network (MRUN) 4L876Faculty of Informatics and Management, University of Hradec Kralove, SPEV project 2102/2021
Character and Word Embeddings for Phishing Email Detection
Phishing attacks are among the most common malicious activities on the Internet. During a phishing attack, cybercriminals present themselves as a trusted organization or individual. Their goal is to lure people to enter their private information, such as passwords and bank card numbers, while believing that nothing malicious is happening. The attack often starts with a phishing email, which is an email that is very similar to a legitimate email, but usually contains links to malicious websites or uses some other techniques to mislead victims. To prevent phishing attacks, it is crucial to detect phishing emails and remove them from email inbox folders. In this paper, a neural network based phishing email detection model is proposed. In comparison to some earlier approaches, our model does not use manually engineered input features. It learns character and word embeddings directly from email texts, and uses them to extract local and global features using convolutional and recurrent layers, respectively. Our model is tested on the two commonly used datasets for phishing email detection, the SpamAssassin Public Corpus and Nazario Phishing Corpus, and it achieves an accuracy of 99.81 % and F_1-score of 99.74 %, which is on par or better than the current state-of-the-art approaches
Deep learning in phishing mitigation: a uniform resource locator-based predictive model
To mitigate the evolution of phish websites, various phishing prediction8 schemes are being optimized eventually. However, the optimized methods produce gratuitous performance overhead due to the limited exploration of advanced phishing cues. Thus, a phishing uniform resource locator-based predictive model is enhanced by this work to defeat this deficiency using deep learning algorithms. This model’s architecture encompasses pre-processing of the effective feature space that is made up of 60 mutual uniform resource locator (URL) phishing features, and a dual deep learning-based model of convolution neural network with bi-directional long short-term memory (CNN-BiLSTM). The proposed predictive model is trained and tested on a dataset of 14,000 phish URLs and 28,074 legitimate URLs. Experimentally, the performance outputs are remarked with a 0.01% false positive rate (FPR) and 99.27% testing accuracy
AntiPhishStack: LSTM-based Stacked Generalization Model for Optimized Phishing URL Detection
The escalating reliance on revolutionary online web services has introduced
heightened security risks, with persistent challenges posed by phishing despite
extensive security measures. Traditional phishing systems, reliant on machine
learning and manual features, struggle with evolving tactics. Recent advances
in deep learning offer promising avenues for tackling novel phishing challenges
and malicious URLs. This paper introduces a two-phase stack generalized model
named AntiPhishStack, designed to detect phishing sites. The model leverages
the learning of URLs and character-level TF-IDF features symmetrically,
enhancing its ability to combat emerging phishing threats. In Phase I, features
are trained on a base machine learning classifier, employing K-fold
cross-validation for robust mean prediction. Phase II employs a two-layered
stacked-based LSTM network with five adaptive optimizers for dynamic
compilation, ensuring premier prediction on these features. Additionally, the
symmetrical predictions from both phases are optimized and integrated to train
a meta-XGBoost classifier, contributing to a final robust prediction. The
significance of this work lies in advancing phishing detection with
AntiPhishStack, operating without prior phishing-specific feature knowledge.
Experimental validation on two benchmark datasets, comprising benign and
phishing or malicious URLs, demonstrates the model's exceptional performance,
achieving a notable 96.04% accuracy compared to existing studies. This research
adds value to the ongoing discourse on symmetry and asymmetry in information
security and provides a forward-thinking solution for enhancing network
security in the face of evolving cyber threats
HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis
Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language
Deep Learning-Based Speech and Vision Synthesis to Improve Phishing Attack Detection through a Multi-layer Adaptive Framework
The ever-evolving ways attacker continues to im prove their phishing
techniques to bypass existing state-of-the-art phishing detection methods pose
a mountain of challenges to researchers in both industry and academia research
due to the inability of current approaches to detect complex phishing attack.
Thus, current anti-phishing methods remain vulnerable to complex phishing
because of the increasingly sophistication tactics adopted by attacker coupled
with the rate at which new tactics are being developed to evade detection. In
this research, we proposed an adaptable framework that combines Deep learning
and Randon Forest to read images, synthesize speech from deep-fake videos, and
natural language processing at various predictions layered to significantly
increase the performance of machine learning models for phishing attack
detection.Comment:
- …