Search CORE

953 research outputs found

Random forest explorations for URL classification

Author: Denholm-Price James
Tsaptsinos Dimitris
Weedon Martyn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/06/2017
Field of study

Phishing is a major concern on the Internet today and many users are falling victim because of criminal’s deceitful tactics. Blacklisting is still the most common defence users have against such phishing websites, but is failing to cope with the increasing number. In recent years, researchers have devised modern ways of detecting such websites using machine learning. One such method is to create machine learnt models of URL features to classify whether URLs are phishing. However, there are varying opinions on what the best approach is for features and algorithms. In this paper, the objective is to evaluate the performance of the Random Forest algorithm using a lexical only dataset. The performance is benchmarked against other machine learning algorithms and additionally against those reported in the literature. Initial results from experiments indicate that the Random Forest algorithm performs the best yielding an 86.9% accuracy

Crossref

Kingston University Research Repository

High Accuracy Phishing Detection Based on Convolutional Neural Networks

Author: Alzaylaee Mohammed K.
Yerima Suleiman Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2020
Field of study

The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this pa-per compares favourably to the state-of-the art in deep learning based phishing website detection

arXiv.org e-Print Archive

Crossref

De Montfort University Open Research Archive

Improving Phishing Website Detection with Machine Learning: Revealing Hidden Patterns for Better Accuracy

Author: Ch Ravi Kumar
Kiran Medikonda Asha
Kiran Saggurthi
Manchala Uma Devi
Narayana Garlapati
Naresh Usikela
Publication venue: Auricle Global Society of Education and Research
Publication date: 27/10/2023
Field of study

Phishing attacks remain a significant threat to internet users globally, leading to substantial financial losses and compromising personal information. This research study investigates various machine learning models for detecting phishing websites, with a primary focus on achieving high accuracy. After an extensive analysis, the Random Forest Classifier emerged as the most suitable choice for this task. Our methodology leveraged machine learning techniques to uncover subtle patterns and relationships in the data, going beyond traditional URL and content-based restrictions. By incorporating diverse website features, including URL and derived attributes, Page source code-based features, HTML JavaScript-based features, and Domain-based features, we achieved impressive results. The proposed approach effectively classified the majority of websites, demonstrating the efficiency of machine learning in addressing the phishing website detection challenge with an accuracy of over 98%, recall exceeding 98%, and a false positive rate of less than 4%. This research offers valuable insights to the field of cyber security, providing internet users with improved protection against phishing attempts

International Journal on Recent and Innovation Trends in Computing and Communication

Develop a Hybrid Classification using an Ensemble Model for Phishing Website Detection

Author: K Subashini
V Narmatha
Publication venue: Auricle Global Society of Education and Research
Publication date: 07/10/2023
Field of study

Solutions to threats posed by technical and social vulnerabilities must be found to secure the web interface. Social engineering attacks frequently use phishing as one of their vectors. The importance is promptly detecting phishing attacks has increased. The classifier model was constructed using publicly accessible data from trustworthy and phishing websites. A variety of methods were used to extract relevant features to build the model. Before a user experiences any harm, Machine Learning algorithms can reliably identify phishing attacks. To identify phishing attacks on the website, this study presents a novel ensemble model. In this paper, the Artificial Neural Network (ANN) and the Random Forest Classifier (RFC) are used in an ensemble method along with the Support Vector Machine (SVM). Compared to previous studies, this ensemble method more accurately and efficiently detects website phishing attacks. According to experimental findings, the proposed system detects phishing attacks 97.3% of the time

International Journal on Recent and Innovation Trends in Computing and Communication

Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Author: Dewan Prateek
Kashyap Anand
Kumaraguru Ponnurangam
Publication venue
Publication date: 14/06/2014
Field of study

Spear phishing is a complex targeted attack in which, an attacker harvests information about the victim prior to the attack. This information is then used to create sophisticated, genuine-looking attack vectors, drawing the victim to compromise confidential information. What makes spear phishing different, and more powerful than normal phishing, is this contextual information about the victim. Online social media services can be one such source for gathering vital information about an individual. In this paper, we characterize and examine a true positive dataset of spear phishing, spam, and normal phishing emails from Symantec's enterprise email scanning service. We then present a model to detect spear phishing emails sent to employees of 14 international organizations, by using social features extracted from LinkedIn. Our dataset consists of 4,742 targeted attack emails sent to 2,434 victims, and 9,353 non targeted attack emails sent to 5,912 non victims; and publicly available information from their LinkedIn profiles. We applied various machine learning algorithms to this labeled data, and achieved an overall maximum accuracy of 97.76% in identifying spear phishing emails. We used a combination of social features from LinkedIn profiles, and stylometric features extracted from email subjects, bodies, and attachments. However, we achieved a slightly better accuracy of 98.28% without the social features. Our analysis revealed that social features extracted from LinkedIn do not help in identifying spear phishing emails. To the best of our knowledge, this is one of the first attempts to make use of a combination of stylometric features extracted from emails, and social features extracted from an online social network to detect targeted spear phishing emails.Comment: Detection of spear phishing using social media feature

arXiv.org e-Print Archive

Crossref

A Framework of New Hybrid Features for Intelligent Detection of Zero Hour Phishing Websites

Author: AK Jain
BB Gupta
G Xiang
H Zuhair
OK Sahingoz
R Gowtham
RM Mohammad
VS Lakshmi
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/04/2019
Field of study

Existing machine learning based approaches for detecting zero hour phishing websites have moderate accuracy and false alarm rates and rely heavily on limited types of features. Phishers are constantly learning their features and use sophisticated tools to adopt the features in phishing websites to evade detections. Therefore, there is a need for continuous discovery of new, robust and more diverse types of prediction features to improve resilience against detection evasions. This paper proposes a framework for predicting zero hour phishing websites by introducing new hybrid features with high prediction performances. Prediction performance of the features was investigated using eight machine learning algorithms in which Random Forest algorithm performed the best with accuracy and false negative rates of 98.45% and 0.73% respectively. It was found that domain registration information and webpage reputation types of features were strong predictors when compared to other feature types. On individual features, webpage reputation features were highly ranked in terms of feature importance weights. The prediction runtime per webpage measured at 7.63s suggest that our approach has a potential for real time applications. Our framework is able to detect phishing websites hosted in either compromised or dedicated phishing domains

Crossref

Birmingham City University Open Access Repository

BCU Open Access