Search CORE

417 research outputs found

HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Author: Chen Yingke
Opara Chidimma
Wei Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language

arXiv.org e-Print Archive

Northumbria Research Link

Crossref

Lancaster E-Prints

Exhaustive study on Detection of phishing practices and tactics

Author: Mehta Rutvik
Publication venue: Assam Don Bosco University
Publication date: 31/07/2021
Field of study

Due to the rapid development in the technologies related to the Internet, users have changed their preferences from conventional shop based shopping to online shopping, from office work to work from home and from personal meetings to web meetings. Along with the rapidly increasing number of users, Internet has also attracted many attackers, such as fraudsters, hackers, spammers and phishers, looking for their victims on the huge cyber space. Phishing is one of the basic cybercrimes, which uses anonymous structure of Internet and social engineering approach, to deceive users with the use of malicious phishing links to gather their private information and credentials. Identifying whether a web link used by the attacker is a legitimate or phishing link is a very challenging problem because of the semantics-based structure of the attack, used by attackers to trick users in to entering their personal information. There are a diverse range of algorithms with different methodologies that can be used to prevent these attacks. The efficiency of such systems may be influenced by a lack of proper choice of classifiers along with the types of feature sets. The purpose of this analysis is to understand the forms of phishing threats and the existing approaches used to deter them

Assam Don Bosco University Journals

DeltaPhish: Detecting Phishing Webpages in Compromised Websites

Author: AY Fu
B Biggio
C Cortes
C Ludl
DH Wolpert
G Chechik
G Xiang
G Xiang
J Hong
KT Chen
L Wenyin
M Khonji
MJ Swain
PF Felzenszwalb
RB Basnet
S Marchal
TC Chen
TC Chen
Publication venue
Publication date: 01/01/2017
Field of study

The large-scale deployment of modern phishing attacks relies on the automatic exploitation of vulnerable websites in the wild, to maximize profit while hindering attack traceability, detection and blacklisting. To the best of our knowledge, this is the first work that specifically leverages this adversarial behavior for detection purposes. We show that phishing webpages can be accurately detected by highlighting HTML code and visual differences with respect to other (legitimate) pages hosted within a compromised website. Our system, named DeltaPhish, can be installed as part of a web application firewall, to detect the presence of anomalous content on a website after compromise, and eventually prevent access to it. DeltaPhish is also robust against adversarial attempts in which the HTML code of the phishing page is carefully manipulated to evade detection. We empirically evaluate it on more than 5,500 webpages collected in the wild from compromised websites, showing that it is capable of detecting more than 99% of phishing webpages, while only misclassifying less than 1% of legitimate pages. We further show that the detection rate remains higher than 70% even under very sophisticated attacks carefully designed to evade our system.Comment: Preprint version of the work accepted at ESORICS 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

Website Phishing Detection Using Machine Learning Techniques

Author: Al-Shaikh A.
Alazaidah R.
Alzyoud M.
Khafajah H.
R. AL-Mousa M.
Samara G.
Publication venue: Arab Journals Platform
Publication date: 01/01/2024
Field of study

Phishing is a cybercrime that is constantly increasing in the recent years due to the increased use of the Internet and its applications. It is one of the most common types of social engineering that aims to disclose or steel users sensitive or personal information. In this paper, two main objectives are considered. The first is to identify the best classifier that can detect phishing among twenty-four different classifiers that represent six learning strategies. The second objective aims to identify the best feature selection method for websites phishing datasets. Using two datasets that are related to Phishing with different characteristics and considering eight evaluation metrics, the results revealed the superiority of RandomForest, FilteredClassifier, and J-48 classifiers in detecting phishing websites. Also, InfoGainAttributeEval method showed the best performance among the four considered feature selection methods

Arab Journals Platform

Unbiased phishing detection using domain name based features

Author: Shirazi Hossein
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2018
Field of study

2018 Summer.Includes bibliographical references.Internet users are coming under a barrage of phishing attacks of increasing frequency and sophistication. While these attacks have been remarkably resilient against the vast range of defenses proposed by academia, industry, and research organizations, machine learning approaches appear to be a promising one in distinguishing between phishing and legitimate websites. There are three main concerns with existing machine learning approaches for phishing detection. The first concern is there is neither a framework, preferably open-source, for extracting feature and keeping the dataset updated nor an updated dataset of phishing and legitimate website. The second concern is the large number of features used and the lack of validating arguments for the choice of the features selected to train the machine learning classifier. The last concern relates to the type of datasets used in the literature that seems to be inadvertently biased with respect to the features based on URL or content. In this thesis, we describe the implementation of our open-source and extensible framework to extract features and create up-to-date phishing dataset. With having this framework, named Fresh-Phish, we implemented 29 different features that we used to detect whether a given website is legitimate or phishing. We used 26 features that were reported in related work and added 3 new features and created a dataset of 6,000 websites with these features of which 3,000 were malicious and 3,000 were genuine and tested our approach. Using 6 different classifiers we achieved the accuracy of 93% which is a reasonable high in this field. To address the second and third concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. We focus on this aspect of phishing websites and design features that explore the relationship of the domain name to the key elements of the website. Our work differs from existing state-of-the-art as our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance processing and classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards dataset collection and usage. We show the robustness of our learning algorithm by testing our classifiers on unknown live phishing URLs and achieve a higher detection accuracy of 99.7% compared to the earlier known best result of 95% detection rate

Mountain Scholar (Digital Collections of Colorado and Wyoming)

A Novel Approach for Phishing Emails Real Time Classification Using K-Means Algorithm

Author: Mhaske-Dhamdhere Vidya
Vanjale Sandeep
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 28/12/2017
Field of study

The dangers phishing becomes considerably bigger problem in online networking, for example, Facebook, twitter and Google+.. In this paper we are mainly focus on a novel approach of real time phishing email classification using machine learning algorithm. We use random forest, Decision tree with J48 ,naïve Bayes we use spam base dataset. On spam base dataset random forest algorithm work best which give true positive 97.2% and falsie negative is 0.88% and give correctly classification 94.82% and incorrectly classification 5.17%

Crossref

Institute of Advanced Engineering and Science

Categorization of Phishing Detection Features And Using the Feature Vectors to Classify Phishing Websites

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such as educating users, using blacklists or extracting phishing characteristics found to exist in phishing attacks. In this thesis, we analyze approaches that extract features from phishing websites and train classification models with extracted feature set to classify phishing websites. We create an exhaustive list of all features used in these approaches and categorize them into 6 broader categories and 33 finer categories. We extract 59 features from the URL, URL redirects, hosting domain (WHOIS and DNS records) and popularity of the website and analyze their robustness in classifying a phishing website. Our emphasis is on determining the predictive performance of robust features. We evaluate the classification accuracy when using the entire feature set and when URL features or site popularity features are excluded from the feature set and show how our approach can be used to effectively predict specific types of phishing attacks such as shortened URLs and randomized URLs. Using both decision table classifiers and neural network classifiers, our results indicate that robust features seem to have enough predictive power to be used in practice.Dissertation/ThesisMasters Thesis Computer Science 201

ASU Digital Repository