Machine Learning Driven Email Phishing Detection

Abstract

Phishing attacks pose significant risks to cybersecurity, exploiting user trust through deceptive email content. This paper presents a machine learning based framework for detecting phishing emails using a 2024 dataset comprising over 80,000 labeled samples sourced from PhishTank and Kaggle. Features were engineered from URLs, email content, and metadata. Five models— Logistic Regression, Support Vector Machine (SVM), Random Forest, XGBoost, and K-Nearest Neighbors (KNN)—were evaluated. Simulated results demonstrate that ensemble models, particularly Random Forest and XGBoost, delivered optimal results, with near-perfect accuracy and recall. The study highlights the efficacy of combining feature-based engineering with ensemble learning to enhance real-time phishing detection

Similar works

Full text

thumbnail-image

DigitalCommons@Kennesaw State University

redirect
Last time updated on 24/01/2026

This paper was published in DigitalCommons@Kennesaw State University.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.