2,004 research outputs found
High Accuracy Phishing Detection Based on Convolutional Neural Networks
The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this pa-per compares favourably to the state-of-the art in deep learning based phishing website detection
Artificial intelligence in the cyber domain: Offense and defense
Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41
An Evasion Attack against ML-based Phishing URL Detectors
Background: Over the year, Machine Learning Phishing URL classification
(MLPU) systems have gained tremendous popularity to detect phishing URLs
proactively. Despite this vogue, the security vulnerabilities of MLPUs remain
mostly unknown. Aim: To address this concern, we conduct a study to understand
the test time security vulnerabilities of the state-of-the-art MLPU systems,
aiming at providing guidelines for the future development of these systems.
Method: In this paper, we propose an evasion attack framework against MLPU
systems. To achieve this, we first develop an algorithm to generate adversarial
phishing URLs. We then reproduce 41 MLPU systems and record their baseline
performance. Finally, we simulate an evasion attack to evaluate these MLPU
systems against our generated adversarial URLs. Results: In comparison to
previous works, our attack is: (i) effective as it evades all the models with
an average success rate of 66% and 85% for famous (such as Netflix, Google) and
less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively;
(ii) realistic as it requires only 23ms to produce a new adversarial URL
variant that is available for registration with a median cost of only
$11.99/year. We also found that popular online services such as Google
SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that
Adversarial training (successful defence against evasion attack) does not
significantly improve the robustness of these systems as it decreases the
success rate of our attack by only 6% on average for all the models. (iv)
Further, we identify the security vulnerabilities of the considered MLPU
systems. Our findings lead to promising directions for future research.
Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but
also highlights implications for future study towards assessing and improving
these systems.Comment: Draft for ACM TOP
HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis
Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language
Detecting Phishing Sites Using ChatGPT
The rise of large language models (LLMs) has had a significant impact on
various domains, including natural language processing and artificial
intelligence. While LLMs such as ChatGPT have been extensively researched for
tasks such as code generation and text synthesis, their application in
detecting malicious web content, particularly phishing sites, has been largely
unexplored. To combat the rising tide of automated cyber attacks facilitated by
LLMs, it is imperative to automate the detection of malicious web content,
which requires approaches that leverage the power of LLMs to analyze and
classify phishing sites. In this paper, we propose a novel method that utilizes
ChatGPT to detect phishing sites. Our approach involves leveraging a web
crawler to gather information from websites and generate prompts based on this
collected data. This approach enables us to detect various phishing sites
without the need for fine-tuning machine learning models and identify social
engineering techniques from the context of entire websites and URLs. To
evaluate the performance of our proposed method, we conducted experiments using
a dataset. The experimental results using GPT-4 demonstrated promising
performance, with a precision of 98.3% and a recall of 98.4%. Comparative
analysis between GPT-3.5 and GPT-4 revealed an enhancement in the latter's
capability to reduce false negatives. These findings not only highlight the
potential of LLMs in efficiently identifying phishing sites but also have
significant implications for enhancing cybersecurity measures and protecting
users from the dangers of online fraudulent activities
Phishing Detection Using Natural Language Processing and Machine Learning
Phishing emails are a primary mode of entry for attackers into an organization. A successful phishing attempt leads to unauthorized access to sensitive information and systems. However, automatically identifying phishing emails is often difficult since many phishing emails have composite features such as body text and metadata that are nearly indistinguishable from valid emails. This paper presents a novel machine learning-based framework, the DARTH framework, that characterizes and combines multiple models, with one model for each composite feature, that enables the accurate identification of phishing emails. The framework analyses each composite feature independently utilizing a multi-faceted approach using Natural Language Processing (NLP) and neural network-based techniques and combines the results of these analyses to classify the emails as malicious or legitimate. Utilizing the framework on more than 150,000 emails and training data from multiple sources, including the authors’ emails and phishtank.com, resulted in the precision (correct identification of malicious observations to the total prediction of malicious observations) of 99.97% with an f-score of 99.98% and accurately identifying phishing emails 99.98% of the time. Utilizing multiple machine learning techniques combined in an ensemble approach across a range of composite features yields highly accurate identification of phishing emails
- …