181 research outputs found
High Accuracy Phishing Detection Based on Convolutional Neural Networks
The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this pa-per compares favourably to the state-of-the art in deep learning based phishing website detection
An Evasion Attack against ML-based Phishing URL Detectors
Background: Over the year, Machine Learning Phishing URL classification
(MLPU) systems have gained tremendous popularity to detect phishing URLs
proactively. Despite this vogue, the security vulnerabilities of MLPUs remain
mostly unknown. Aim: To address this concern, we conduct a study to understand
the test time security vulnerabilities of the state-of-the-art MLPU systems,
aiming at providing guidelines for the future development of these systems.
Method: In this paper, we propose an evasion attack framework against MLPU
systems. To achieve this, we first develop an algorithm to generate adversarial
phishing URLs. We then reproduce 41 MLPU systems and record their baseline
performance. Finally, we simulate an evasion attack to evaluate these MLPU
systems against our generated adversarial URLs. Results: In comparison to
previous works, our attack is: (i) effective as it evades all the models with
an average success rate of 66% and 85% for famous (such as Netflix, Google) and
less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively;
(ii) realistic as it requires only 23ms to produce a new adversarial URL
variant that is available for registration with a median cost of only
$11.99/year. We also found that popular online services such as Google
SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that
Adversarial training (successful defence against evasion attack) does not
significantly improve the robustness of these systems as it decreases the
success rate of our attack by only 6% on average for all the models. (iv)
Further, we identify the security vulnerabilities of the considered MLPU
systems. Our findings lead to promising directions for future research.
Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but
also highlights implications for future study towards assessing and improving
these systems.Comment: Draft for ACM TOP
HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis
Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language
Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics
Cybercriminals resort to phishing as a simple and cost-effective medium to
perpetrate cyber-attacks on today's Internet. Recent studies in phishing
detection are increasingly adopting automated feature selection over
traditional manually engineered features. This transition is due to the
inability of existing traditional methods to extrapolate their learning to new
data. To this end, in this paper, we propose WebPhish, a deep learning
technique using automatic feature selection extracted from the raw URL and HTML
of a web page. This approach is the first of its kind, which uses the
concatenation of URL and HTML embedding feature vectors as input into a
Convolutional Neural Network model to detect phishing attacks on web pages.
Extensive experiments on a real-world dataset yielded an accuracy of 98
percent, outperforming other state-of-the-art techniques. Also, WebPhish is a
client-side strategy that is completely language-independent and can conduct
lightweight phishing detection regardless of the web page's textual language
An Analysis of Malicious URL Detection Using Deep Learning
Considerable progress has been achieved in the digital domain, particularly in the online realm where a multitude of activities are being conducted. Cyberattacks, particularly malicious URLs, have emerged as a serious security risk, deceiving users into compromising their systems and resulting in annual losses of billions of dollars. Website security is essential. It is critical to quickly identify dangerous or bad URLs. Blacklists and shallow learning are two techniques that are being investigated in response to the threat posed by malicious URLs and phishing efforts. Historically, blacklists have been used to accomplish this. Techniques based on blacklists have limitations because they can't detect malicious URLs that have newly generated. In order to overcome these challenges, recent research has focused on applying machine learning and deep learning techniques. By automatically discovering complex patterns and representations from unstructured data, deep learning has become a potent tool for recognizing and reducing these risks. The goal of this paper is to present a thorough analysis and structural comprehension of Deep Learning based malware detection systems. The literature review that covers different facets of this subject, like feature representation and algorithm design, is found and examined. Moreover, a precise explanation of the role of deep learning in detecting dangerous URLs is provided
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to
create lists of domain names which can be used for command and control (C&C)
purposes. Approaches based on machine learning have recently been developed to
automatically detect generated domain names in real-time. In this work, we
present a novel DGA called CharBot which is capable of producing large numbers
of unregistered domain names that are not detected by state-of-the-art
classifiers for real-time detection of DGAs, including the recently published
methods FANCI (a random forest based on human-engineered features) and LSTM.MI
(a deep learning approach). CharBot is very simple, effective and requires no
knowledge of the targeted DGA classifiers. We show that retraining the
classifiers on CharBot samples is not a viable defense strategy. We believe
these findings show that DGA classifiers are inherently vulnerable to
adversarial attacks if they rely only on the domain name string to make a
decision. Designing a robust DGA classifier may, therefore, necessitate the use
of additional information besides the domain name alone. To the best of our
knowledge, CharBot is the simplest and most efficient black-box adversarial
attack against DGA classifiers proposed to date
An adaptive approach for internet phishing detection based on log data
The Internet has become one of the most important daily socials, financial and other activities. the number of customers who use the Internet to conduct their business and purchases is very large. This results in billions of dollars being transferred every day online. Such a large amount of money attracts the attention of cybercriminals to carry out their illegal activities. “Fraud” is one of the most dangerous of these methods, especially phishing, where attackers try to steal user credentials using fraudulent emails, fake websites, or both. The proposed system for this paper includes efficient data extraction from the web file through data collection and preprocessing. and web usage mining procedure to extract features that demonstrate user behavior. and feature-extracting URL analysis to detect website phishing addresses. After that, the features from the above two parts are combined to make the number of features sixty-three. Finally, a classification algorithm (Random Forests) is applied to determine if website addresses are phishing or legitimate. Suggested algorithms performance is determined by using a confusion matrix and a number of metrics that shows the robustness of the proposed system
Deep Learning Multi-Agent Model for Phishing Cyber-attack Detection
Phishing attacks have become one of the most prominent cyber threats in recent times, which poses a significant risk to the security of organizations and individuals. Therefore, detecting such Cyber attacks has become crucial to ensure a secure digital environment. In this regard, deep learning techniques have shown promising results for the detection of phishing attacks due to their ability to learn and extract features from raw data. In this study, we propose a deep learning-based approach to detecting phishing attacks by using a combination of convolutional neural networks (CNN) and long short-term memory (LSTM) networks. Our proposed model extracts features from the URL and email content to detect phishing attempts. We evaluate the proposed approach on a real-world dataset and achieve an accuracy of over 95%. The results indicate that the proposed approach can effectively detect phishing attacks and can be utilized in real-world applications to ensure a secure digital environment
Classification of URLs Using Deep Neural Networks
Tato bakalářská práce se zabývá problémem automatické klasifikace internetových adres. Důraz je kladen na hluboké neuronové sítě, konkrétně na modely, které pracují se vstupem na úrovni jednotlivých znaků. V práci je shrnutý současný stav řešení a je navržen model vhodný pro nasazení do produkce reálného firemního prostředí.My work explores the field of automatic URL classification with particular attention to character-level deep neural networks. It summarizes recent advancements in the field and proposes a working model which outperforms the enterprise baseline on a real world dataset
Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions
This work was supported in part by the Ministry of Higher Education under the Fundamental Research Grant Scheme under Grant FRGS/1/2018/ICT04/UTM/01/1; and in part by the Faculty of Informatics and Management, University of Hradec Kralove, through SPEV project under Grant 2102/2022.Phishing has become an increasing concern and captured the attention of end-users as well
as security experts. Existing phishing detection techniques still suffer from the de ciency in performance
accuracy and inability to detect unknown attacks despite decades of development and improvement.
Motivated to solve these problems, many researchers in the cybersecurity domain have shifted their attention
to phishing detection that capitalizes on machine learning techniques. Deep learning has emerged as a branch
of machine learning that becomes a promising solution for phishing detection in recent years. As a result,
this study proposes a taxonomy of deep learning algorithm for phishing detection by examining 81 selected
papers using a systematic literature review approach. The paper rst introduces the concept of phishing and
deep learning in the context of cybersecurity. Then, taxonomies of phishing detection and deep learning
algorithm are provided to classify the existing literature into various categories. Next, taking the proposed
taxonomy as a baseline, this study comprehensively reviews the state-of-the-art deep learning techniques
and analyzes their advantages as well as disadvantages. Subsequently, the paper discusses various issues
that deep learning faces in phishing detection and proposes future research directions to overcome these
challenges. Finally, an empirical analysis is conducted to evaluate the performance of various deep learning
techniques in a practical context, and to highlight the related issues that motivate researchers in their future
works. The results obtained from the empirical experiment showed that the common issues among most of
the state-of-the-art deep learning algorithms are manual parameter-tuning, long training time, and de cient
detection accuracy.Ministry of Higher Education under the Fundamental Research Grant Scheme FRGS/1/2018/ICT04/UTM/01/1Faculty of Informatics and Management, University of Hradec Kralove, through SPEV project 2102/202
- …