Search CORE

1,318 research outputs found

User-Behavior Based Detection of Infection Onset

Author: Crowell Alexander
Ma Qiang
Xu Kui
Yao Danfeng(Daphne)
Publication venue
Publication date: 01/01/2010
Field of study

A major vector of computer infection is through exploiting software or design flaws in networked applications such as the browser. Malicious code can be fetched and executed on a victim’s machine without the user’s permission, as in drive-by download (DBD) attacks. In this paper, we describe a new tool called DeWare for detecting the onset of infection delivered through vulnerable applications. DeWare explores and enforces causal relationships between computer-related human behaviors and system properties, such as file-system access and process execution. Our tool can be used to provide real time protection of a personal computer, as well as for diagnosing and evaluating untrusted websites for forensic purposes. Besides the concrete DBD detection solution, we also formally define causal relationships between user actions and system events on a host. Identifying and enforcing correct causal relationships have important applications in realizing advanced and secure operating systems. We perform extensive experimental evaluation, including a user study with 21 participants, thousands of legitimate websites (for testing false alarms), as well as 84 malicious websites in the wild. Our results show that DeWare is able to correctly distinguish legitimate download events from unauthorized system events with a low false positive rate (< 1%)

Computer Science Technical Reports @Virginia Tech

Design of Automated Website Phishing Detection using Sequential Mechanism of RCL Algorithm

Author: C. Rajeswary et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 07/11/2023
Field of study

The phishing outbreaks in internet has become a major problem in web safety in recent years. The phishers will be stealing crucial economic data regarding the web user to perform economic break-in. In order to predict phishing websites, many blacklist-based phishing website recognition methods are used in this study. Traditional methods of detecting phishing websites rely on static features and rule-based schemes, which can be evaded by attackers. Recently, Deep Learning (DL) and Machine Learning (ML) models are employed for automated website phishing detection. With this motivation, this study develops an automated website phishing detection using the sequential mechanism of RCL algorithm. The proposed model employs Long-Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Random Forest (RF) models for the detection of attacks in the URLs and webpages by the similarity measurement of the decoy contents. The proposed model involves three major components namely, RF for URL phishing detection, CNN based phishing webpage detection, and LSTM based website classification (i.e., legitimate and phishing). The experimental result analysis of the RCL technique is tested on the benchmark dataset of Alexa and PhishTank. A comprehensive comparison study highlighted that the RCL algorithm accomplishes enhanced phishing detection performance over other existing techniques in terms of distinct evaluation metrics

International Journal on Recent and Innovation Trends in Computing and Communication

AntiPhishStack: LSTM-based Stacked Generalization Model for Optimized Phishing URL Detection

Author: Aslam Hafsa
Aslam Saba
Hui Chen
Manzoor Arslan
Rasool Abdur
Publication venue
Publication date: 21/01/2024
Field of study

The escalating reliance on revolutionary online web services has introduced heightened security risks, with persistent challenges posed by phishing despite extensive security measures. Traditional phishing systems, reliant on machine learning and manual features, struggle with evolving tactics. Recent advances in deep learning offer promising avenues for tackling novel phishing challenges and malicious URLs. This paper introduces a two-phase stack generalized model named AntiPhishStack, designed to detect phishing sites. The model leverages the learning of URLs and character-level TF-IDF features symmetrically, enhancing its ability to combat emerging phishing threats. In Phase I, features are trained on a base machine learning classifier, employing K-fold cross-validation for robust mean prediction. Phase II employs a two-layered stacked-based LSTM network with five adaptive optimizers for dynamic compilation, ensuring premier prediction on these features. Additionally, the symmetrical predictions from both phases are optimized and integrated to train a meta-XGBoost classifier, contributing to a final robust prediction. The significance of this work lies in advancing phishing detection with AntiPhishStack, operating without prior phishing-specific feature knowledge. Experimental validation on two benchmark datasets, comprising benign and phishing or malicious URLs, demonstrates the model's exceptional performance, achieving a notable 96.04% accuracy compared to existing studies. This research adds value to the ongoing discourse on symmetry and asymmetry in information security and provides a forward-thinking solution for enhancing network security in the face of evolving cyber threats

arXiv.org e-Print Archive

An Evasion Attack against ML-based Phishing URL Detectors

Author: Babar M. Ali
Gaire Raj
Sabir Bushra
Publication venue
Publication date: 18/05/2020
Field of study

Background: Over the year, Machine Learning Phishing URL classification (MLPU) systems have gained tremendous popularity to detect phishing URLs proactively. Despite this vogue, the security vulnerabilities of MLPUs remain mostly unknown. Aim: To address this concern, we conduct a study to understand the test time security vulnerabilities of the state-of-the-art MLPU systems, aiming at providing guidelines for the future development of these systems. Method: In this paper, we propose an evasion attack framework against MLPU systems. To achieve this, we first develop an algorithm to generate adversarial phishing URLs. We then reproduce 41 MLPU systems and record their baseline performance. Finally, we simulate an evasion attack to evaluate these MLPU systems against our generated adversarial URLs. Results: In comparison to previous works, our attack is: (i) effective as it evades all the models with an average success rate of 66% and 85% for famous (such as Netflix, Google) and less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively; (ii) realistic as it requires only 23ms to produce a new adversarial URL variant that is available for registration with a median cost of only $11.99/year. We also found that popular online services such as Google SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that Adversarial training (successful defence against evasion attack) does not significantly improve the robustness of these systems as it decreases the success rate of our attack by only 6% on average for all the models. (iv) Further, we identify the security vulnerabilities of the considered MLPU systems. Our findings lead to promising directions for future research. Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but also highlights implications for future study towards assessing and improving these systems.Comment: Draft for ACM TOP

arXiv.org e-Print Archive

HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Author: Chen Yingke
Opara Chidimma
Wei Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language

arXiv.org e-Print Archive

Northumbria University Research Portal

Crossref

Lancaster E-Prints

Unbiased phishing detection using domain name based features

Author: Shirazi Hossein
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2018
Field of study

2018 Summer.Includes bibliographical references.Internet users are coming under a barrage of phishing attacks of increasing frequency and sophistication. While these attacks have been remarkably resilient against the vast range of defenses proposed by academia, industry, and research organizations, machine learning approaches appear to be a promising one in distinguishing between phishing and legitimate websites. There are three main concerns with existing machine learning approaches for phishing detection. The first concern is there is neither a framework, preferably open-source, for extracting feature and keeping the dataset updated nor an updated dataset of phishing and legitimate website. The second concern is the large number of features used and the lack of validating arguments for the choice of the features selected to train the machine learning classifier. The last concern relates to the type of datasets used in the literature that seems to be inadvertently biased with respect to the features based on URL or content. In this thesis, we describe the implementation of our open-source and extensible framework to extract features and create up-to-date phishing dataset. With having this framework, named Fresh-Phish, we implemented 29 different features that we used to detect whether a given website is legitimate or phishing. We used 26 features that were reported in related work and added 3 new features and created a dataset of 6,000 websites with these features of which 3,000 were malicious and 3,000 were genuine and tested our approach. Using 6 different classifiers we achieved the accuracy of 93% which is a reasonable high in this field. To address the second and third concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. We focus on this aspect of phishing websites and design features that explore the relationship of the domain name to the key elements of the website. Our work differs from existing state-of-the-art as our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance processing and classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards dataset collection and usage. We show the robustness of our learning algorithm by testing our classifiers on unknown live phishing URLs and achieve a higher detection accuracy of 99.7% compared to the earlier known best result of 95% detection rate

Mountain Scholar (Digital Collections of Colorado and Wyoming)