Search CORE

64 research outputs found

Detecting Cloud-Based Phishing Attacks by Combining Deep Learning Models

Author: Atre Medha
Jha Birendra
Rao Ashwini
Publication venue
Publication date: 05/04/2022
Field of study

Web-based phishing attacks nowadays exploit popular cloud web hosting services and apps such as Google Sites and Typeform for hosting their attacks. Since these attacks originate from reputable domains and IP addresses of the cloud services, traditional phishing detection methods such as IP reputation monitoring and blacklisting are not very effective. Here we investigate the effectiveness of deep learning models in detecting this class of cloud-based phishing attacks. Specifically, we evaluate deep learning models for three phishing detection methods--LSTM model for URL analysis, YOLOv2 model for logo analysis, and triplet network model for visual similarity analysis. We train the models using well-known datasets and test their performance on phishing attacks in the wild. Our results qualitatively explain why the models succeed or fail. Furthermore, our results highlight how combining results from the individual models can improve the effectiveness of detecting cloud-based phishing attacks

arXiv.org e-Print Archive

Performance Evaluation of Machine Learning Techniques for Identifying Forged and Phony Uniform Resource Locators (URLs)

Author: Ajayi A. A.
Azeez N. A.
Publication venue: 'African Journals Online (AJOL)'
Publication date: 22/11/2019
Field of study

Since the invention of Information and Communication Technology (ICT), there has been a great shift from the erstwhile traditional approach of handling information across the globe to the usage of this innovation. The application of this initiative cut across almost all areas of human endeavours. ICT is widely utilized in education and production sectors as well as in various financial institutions. It is of note that many people are using it genuinely to carry out their day to day activities while others are using it to perform nefarious activities at the detriment of other cyber users. According to several reports which are discussed in the introductory part of this work, millions of people have become victims of fake Uniform Resource Locators (URLs) sent to their mails by spammers. Financial institutions are not left out in the monumental loss recorded through this illicit act over the years. It is worth mentioning that, despite several approaches currently in place, none could confidently be confirmed to provide the best and reliable solution. According to several research findings reported in the literature, researchers have demonstrated how machine learning algorithms could be employed to verify and confirm compromised and fake URLs in the cyberspace. Inconsistencies have however been noticed in the researchers’ findings and also their corresponding results are not dependable based on the values obtained and conclusions drawn from them. Against this backdrop, the authors carried out a comparative analysis of three learning algorithms (Naïve Bayes, Decision Tree and Logistics Regression Model) for verification of compromised, suspicious and fake URLs and determine which is the best of all based on the metrics (F-Measure, Precision and Recall) used for evaluation. Based on the confusion metrics measurement, the result obtained shows that the Decision Tree (ID3) algorithm achieves the highest values for recall, precision and f-measure. It unarguably provides efficient and credible means of maximizing the detection of compromised and malicious URLs. Finally, for future work, authors are of the opinion that two or more supervised learning algorithms can be hybridized to form a single effective and more efficient algorithm for fake URLs verification.Keywords: Learning-algorithms, Forged-URL, Phoney-URL, performance-compariso

AJOL - African Journals Online

Phishing Sites Detection from a Web Developer’s Perspective Using Machine Learning

Author: Verma Rakesh
Zhou Xin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2020
Field of study

The Internet has enabled unprecedented communication and new technologies. Concomitantly, it has brought the bane of phishing and exacerbated vulnerabilities. In this paper, we propose a model to detect phishing webpages from a web developer’s perspective. From this standpoint, we design 120 novel features based on content from a webpage, four time-based and two search-based novel features, plus we use 34 other content-based and 11 heuristic features to optimize the model. Moreover, we select Random Committee (Base learner: Random Tree) for our framework since it has the best performance after comparing with six other algorithms: Hellinger Distance Decision Tree, SVM, Logistic Regression, J48, Naive Bayes, and Random Forest. In real-time experiments, the model achieved 99.4% precision and 98.3% MCC with 0.1% false positive rate in 5-fold crossvalidation using the realistic scenario of an unbalanced dataset

Crossref

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Performance Assessment of some Phishing predictive models based on Minimal Feature corpus

Author: A.S Sodiya
Abdul Abiodun Orunsolu
B Oladimeji G., Mr.
S.O Kareem
Publication venue: (Print) 1558-7215
Publication date: 05/12/2021
Field of study

Phishing is currently one of the severest cybersecurity challenges facing the emerging online community. With damages running into millions of dollars in financial and brand losses, the sad tale of phishing activities continues unabated. This led to an arms race between the con artists and online security community which demand a constant investigation to win the cyberwar. In this paper, a new approach to phishing is investigated based on the concept of minimal feature set on some selected remarkable machine learning algorithms. The goal of this is to select and determine the most efficient machine learning methodology without undue high computational requirement usually occasioned by non-minimal feature corpus. Using the frequency analysis approach, a 13-dimensional feature set consisting of 85% URL-based feature category and 15% non-URL-based feature category was generated. This is because the URL-based features are observed to be more regularly exploited by phishers in most zero-day attacks. The proposed minimal feature set is then trained on a number of classifiers consisting of Random Tree, Decision Tree, Artificial Neural Network, Support Vector Machine and Naïve Bayes. Using 10 fold-cross validation, the approach was experimented and evaluated with a dataset consisting of 10000 phishing instances. The results indicate that Random Tree outperforms other classifiers with significant accuracy of 96.1% and a Receiver’s Operating Curve (ROC) value of 98.7%. Thus, the approach provides the performance metrics of various state of art machine learning approaches popular with phishing detection which can stimulate further deeper research work in the evaluation of other ML techniques with the minimal feature set approach

Embry-Riddle Aeronautical University

A Framework of New Hybrid Features for Intelligent Detection of Zero Hour Phishing Websites

Author: AK Jain
BB Gupta
G Xiang
H Zuhair
OK Sahingoz
R Gowtham
RM Mohammad
VS Lakshmi
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/04/2019
Field of study

Existing machine learning based approaches for detecting zero hour phishing websites have moderate accuracy and false alarm rates and rely heavily on limited types of features. Phishers are constantly learning their features and use sophisticated tools to adopt the features in phishing websites to evade detections. Therefore, there is a need for continuous discovery of new, robust and more diverse types of prediction features to improve resilience against detection evasions. This paper proposes a framework for predicting zero hour phishing websites by introducing new hybrid features with high prediction performances. Prediction performance of the features was investigated using eight machine learning algorithms in which Random Forest algorithm performed the best with accuracy and false negative rates of 98.45% and 0.73% respectively. It was found that domain registration information and webpage reputation types of features were strong predictors when compared to other feature types. On individual features, webpage reputation features were highly ranked in terms of feature importance weights. The prediction runtime per webpage measured at 7.63s suggest that our approach has a potential for real time applications. Our framework is able to detect phishing websites hosted in either compromised or dedicated phishing domains

Crossref

Birmingham City University Open Access Repository

BCU Open Access

VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity

Author: Abdelnabi Sahar
Fritz Mario
Krombholz Katharina
Publication venue
Publication date: 05/07/2020
Field of study

Phishing websites are still a major threat in today's Internet ecosystem. Despite numerous previous efforts, similarity-based detection methods do not offer sufficient protection for the trusted websites - in particular against unseen phishing pages. This paper contributes VisualPhishNet, a new similarity-based phishing detection framework, based on a triplet Convolutional Neural Network (CNN). VisualPhishNet learns profiles for websites in order to detect phishing websites by a similarity metric that can generalize to pages with new visual appearances. We furthermore present VisualPhish, the largest dataset to date that facilitates visual phishing detection in an ecologically valid manner. We show that our method outperforms previous visual similarity phishing detection approaches by a large margin while being robust against a range of evasion attacks

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude

Author: Naragam Krishna Vamsi
Nilizadeh Shirin
Roy Sayak Saha
Thota Poojitha
Publication venue
Publication date: 29/10/2023
Field of study

The advanced capabilities of Large Language Models (LLMs) have made them invaluable across various applications, from conversational agents and content creation to data analysis, research, and innovation. However, their effectiveness and accessibility also render them susceptible to abuse for generating malicious content, including phishing attacks. This study explores the potential of using four popular commercially available LLMs - ChatGPT (GPT 3.5 Turbo), GPT 4, Claude and Bard to generate functional phishing attacks using a series of malicious prompts. We discover that these LLMs can generate both phishing emails and websites that can convincingly imitate well-known brands, and also deploy a range of evasive tactics for the latter to elude detection mechanisms employed by anti-phishing systems. Notably, these attacks can be generated using unmodified, or "vanilla," versions of these LLMs, without requiring any prior adversarial exploits such as jailbreaking. As a countermeasure, we build a BERT based automated detection tool that can be used for the early detection of malicious prompts to prevent LLMs from generating phishing content attaining an accuracy of 97\% for phishing website prompts, and 94\% for phishing email prompts

arXiv.org e-Print Archive

An adaptive approach for internet phishing detection based on log data

Author: Abdulbaqi Azmi Shawkat
Ibrahim Kareem K.
Nejrs Salwa Mohammed
Obaid Ahmed J.
Publication venue: 'International University of Sarajevo'
Publication date: 12/10/2021
Field of study

The Internet has become one of the most important daily socials, financial and other activities. the number of customers who use the Internet to conduct their business and purchases is very large. This results in billions of dollars being transferred every day online. Such a large amount of money attracts the attention of cybercriminals to carry out their illegal activities. “Fraud” is one of the most dangerous of these methods, especially phishing, where attackers try to steal user credentials using fraudulent emails, fake websites, or both. The proposed system for this paper includes efficient data extraction from the web file through data collection and preprocessing. and web usage mining procedure to extract features that demonstrate user behavior. and feature-extracting URL analysis to detect website phishing addresses. After that, the features from the above two parts are combined to make the number of features sixty-three. Finally, a classification algorithm (Random Forests) is applied to determine if website addresses are phishing or legitimate. Suggested algorithms performance is determined by using a confusion matrix and a number of metrics that shows the robustness of the proposed system

Periodicals of Engineering and Natural Sciences (PEN - International University of Sarajevo)