12 research outputs found


    Get PDF
    Email is an economical facet of communication, the importance of which is increasing in spite of access to other approaches, such as electronic messaging, social networks, and phone applications. The business arena depends largely on the use of email, which urges the proper management of emails due to disruptive factors such as spams, phishing emails, and multi-folder categorization. The present study aimed to review the studies regarding emails, which were published during 2016-2020, based on the problem description analysis in terms of datasets, applications areas, classification techniques, and feature sets. In addition, other areas involving email classifications were identified and comprehensively reviewed. The results indicated four email application areas, while the open issues and research directions of email classifications were implicated for further investigation

    A novel hybrid approach of SVM combined with NLP and probabilistic neural network for email phishing

    Get PDF
    Phishing attacks are one of the slanting cyber-attacks that apply socially engineered messages that are imparted to individuals from expert hackers going for tricking clients to uncover their delicate data, the most mainstream correspondence channel to those messages is through clients' emails. Phishing has turned into a generous danger for web clients and a noteworthy reason for money related misfortunes. Therefore, different arrangements have been created to handle this issue. Deceitful emails, also called phishing emails, utilize a scope of impact strategies to convince people to react, for example, promising a fiscal reward or summoning a feeling of criticalness. Regardless of far reaching alerts and intends to instruct clients to distinguish phishing sends, these are as yet a pervasive practice and a worthwhile business. The creators accept that influence, as a style of human correspondence intended to impact others, has a focal job in fruitful advanced tricks. Cyber criminals have ceaselessly propelling their techniques for assault. The current strategies to recognize the presence of such malevolent projects and to keep them from executing are static, dynamic and hybrid analysis. In this work we are proposing a hybrid methodology for phishing detection incorporating feature extraction and classification of the mails using SVM. At last, alongside the chose features, the PNN characterizes the spam mails from the genuine mails with more exactness and accuracy

    NoFish; Total Anti-Phishing Protection System

    Get PDF
    Phishing attacks have been identified by researchers as one of the major cyber-attack vectors which the general public has to face today. Although software companies launch new anti-phishing products, these products cannot prevent all the phishing attacks. The proposed solution, 201C;No Fish201D; is a total anti-phishing protection system created especially for end-users as well as for organizations. In this paper, a realtime anti-phishing system, which has been implemented using four main phishing detection mechanisms, is proposed. The system has the following distinguishing properties from related studies in the literature: language independence, use of a considerable amount of phishing and legitimate data

    Convolution Neural Networks for Phishing Detection

    Get PDF
    Phishing is one of the significant threats in cyber security. Phishing is a form of social engineering that uses e-mails with malicious websites to solicitate personal information. Phishing e-mails are growing in alarming number. In this paper we propose a novel machine learning approach to classify phishing websites using Convolution Neural Networks (CNNs) that use URL based features. CNNs consist of a stack of convolution, pooling layers, and a fully connected layer. CNNs accept images as input and perform feature extraction and classification. Many CNN models are available today. To avoid vanishing gradient problem, recent CNNs use entropy loss function with Rectified Linear Units (ReLU). To use a CNN, we convert feature vectors into images. To evaluate our approach, we use a dataset consists of 1,353 real world URLs that were classified into three categories-legitimate, suspicious, and phishing. The images representing feature vectors are classified using a simple CNN. We developed MATLAB scripts to convert vectors into images and to implement a simple CNN model. The classification accuracy obtained was 86.5 percent

    The effectiveness of url features on phishing emails classification using machine learning approach

    Get PDF
    Phishing email classification requires features so that the performance obtained produces good accuracy. One of the reasons for the lack of development of models for detecting phishing emails is the complexity of the feature selection. Feature selection is one of the essential parts of getting a good classification result, commonly used features are header, body, and Uniform Resource Locator (URL). Besides the email body text content, the URL is one of the leading indicators that the phishing attack successfully happened. The URL is commonly located on the body of the phishing email to get the victim's attention. It will redirect the victim to a fake website to obtain personal information from the victim. There is a lack of information about how the URL features affect the phishing email classification results. Therefore, this work focuses on using URL features to determine whether an email is phishing or legitimate using machine learning approaches. Two public datasets used in this work are the Online Phishing Corpus and Enron Corpus. The URL features are extracted using the Beautiful Soup library. Two machine learning classifiers used in this work are Support Vector Machine (SVM) and Artificial Neural Network (ANN). The experiments were divided into two based on features used in the classifiers. The first experiment used raw email data with URL features, while the second only used raw email data. The first experiment shows higher accuracy in both classifiers, SVM and ANN. Hence, this research proves that the impact of selecting URL features will increase the performance of the classification

    Enhanced Classification Method for Phishing Emails Detection

    Get PDF
    Emails are currently the main communication method worldwide as it proven in its efficiency. Phishing emails in the other hand is one of the major threats which results in significant losses, estimated at billions of dollars. Phishing emails is a more dynamic problem, a struggle between the phishers and defenders where the phishers have more flexibility in manipulating the emails features and evading the anti-phishing techniques. Many solutions have been proposed to mitigate the phishing emails impact on the targeted sectors, but none have achieved 100% detection and accuracy. As phishing techniques are evolving, the solutions need to be evolved and generalized in order to mitigate as much as possible. This article presents a new emergent classification model based on hybrid feature selection method that combines two common feature selection methods, Information Gain and Genetic Algorithm that keep only significant and high-quality features in the final classifier. The Proposed hybrid approach achieved 98.9% accuracy rate against phishing emails dataset comprising 8266 instances and results depict enhancement by almost 4%. Furthermore, the presented technique has contributed to reducing the search space by reducing the number of selected features

    Phishing email detection using Natural Language Processing techniques : a literature survey

    Get PDF
    Phishing is the most prevalent method of cybercrime that convinces people to provide sensitive information; for instance, account IDs, passwords, and bank details. Emails, instant messages, and phone calls are widely used to launch such cyber-attacks. Despite constant updating of the methods of avoiding such cyber-attacks, the ultimate outcome is currently inadequate. On the other hand, phishing emails have increased exponentially in recent years, which suggests a need for more effective and advanced methods to counter them. Numerous methods have been established to filter phishing emails, but the problem still needs a complete solution. To the best of our knowledge, this is the first survey that focuses on using Natural Language Processing (NLP) and Machine Learning (ML) techniques to detect phishing emails. This study provides an analysis of the numerous state-of-the-art NLP strategies currently in use to identify phishing emails at various stages of the attack, with an emphasis on ML strategies. These approaches are subjected to a comparative assessment and analysis. This gives a sense of the problem, its immediate solution space, and the expected future research directions

    A New English/Arabic Parallel Corpus for Phishing Emails

    Get PDF
    Phishing involves malicious activity whereby phishers, in the disguise of legitimate entities, obtain illegitimate access to the victims’ personal and private information, usually through emails. Currently, phishing attacks and threats are being handled effectively through the use of the latest phishing email detection solutions. Most current phishing detection systems assume phishing attacks to be in English, though attacks in other languages are growing. In particular, Arabic is a widely used language and therefore represents a vulnerable target. However, there is a significant shortage of corpora that can be used to develop Arabic phishing detection systems. This paper presents the development of a new English-Arabic parallel phishing email corpus that has been developed from the anti-phishing share task text (IWSPA-AP 2018). The email content was to be translated, and the task had been allotted to 10 volunteers who had a university background and were English and Arabic language experts. To evaluate the effectiveness of the new corpus, we develop phishing email detection models using Term Frequency–Inverse Document Frequency (TF-IDF) and Multilayer Perceptron using 1258 emails in Arabic and English that have equal ratios of legitimate and phishing emails. The experimental findings show that the accuracy reaches 96.82% for the Arabic dataset and 94.63% for the emails in English, providing some assurance of the potential value of the parallel corpus developed