4 research outputs found

    From feature engineering and topics models to enhanced prediction rates in phishing detection

    Get PDF
    Phishing is a type of fraud attempt in which the attacker, usually by e-mail, pretends to be a trusted person or entity in order to obtain sensitive information from a target. Most recent phishing detection researches have focused on obtaining highly distinctive features from the metadata and text of these e-mails. The obtained attributes are then used to feed classification algorithms in order to determine whether they are phishing or legitimate messages. In this paper, it is proposed an approach based on machine learning to detect phishing e-mail attacks. The methods that compose this approach are performed through a feature engineering process based on natural language processing, lemmatization, topics modeling, improved learning techniques for resampling and cross-validation, and hyperparameters configuration. The first proposed method uses all the features obtained from the Document-Term Matrix (DTM) in the classification algorithms. The second one uses Latent Dirichlet Allocation (LDA) as a operation to deal with the problems of the “curse of dimensionality”, the sparsity, and the text context portion included in the obtained representation. The proposed approach reached marks with an F1-measure of 99.95% success rate using the XGBoost algorithm. It outperforms state-of-the-art phishing detection researches for an accredited data set, in applications based only on the body of the e-mails, without using other e-mail features such as its header, IP information or number of links in the text

    Design of Automated Website Phishing Detection using Sequential Mechanism of RCL Algorithm

    Get PDF
    The phishing outbreaks in internet has become a major problem in web safety in recent years. The phishers will be stealing crucial economic data regarding the web user to perform economic break-in. In order to predict phishing websites, many blacklist-based phishing website recognition methods are used in this study. Traditional methods of detecting phishing websites rely on static features and rule-based schemes, which can be evaded by attackers. Recently, Deep Learning (DL) and Machine Learning (ML) models are employed for automated website phishing detection. With this motivation, this study develops an automated website phishing detection using the sequential mechanism of RCL algorithm. The proposed model employs Long-Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Random Forest (RF) models for the detection of attacks in the URLs and webpages by the similarity measurement of the decoy contents. The proposed model involves three major components namely, RF for URL phishing detection, CNN based phishing webpage detection, and LSTM based website classification (i.e., legitimate and phishing). The experimental result analysis of the RCL technique is tested on the benchmark dataset of Alexa and PhishTank. A comprehensive comparison study highlighted that the RCL algorithm accomplishes enhanced phishing detection performance over other existing techniques in terms of distinct evaluation metrics

    A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces

    Get PDF
    Mutemi, A., & Bacao, F. (2023). A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces. Scientific Reports, 13(1), 1-16. [12499]. https://doi.org/10.1038/s41598-023-38304-5Organized retail crime (ORC) is a significant issue for retailers, marketplace platforms, and consumers. Its prevalence and influence have increased fast in lockstep with the expansion of online commerce, digital devices, and communication platforms. Today, it is a costly affair, wreaking havoc on enterprises’ overall revenues and continually jeopardizing community security. These negative consequences are set to rocket to unprecedented heights as more people and devices connect to the Internet. Detecting and responding to these terrible acts as early as possible is critical for protecting consumers and businesses while also keeping an eye on rising patterns and fraud. The issue of detecting fraud in general has been studied widely, especially in financial services, but studies focusing on organized retail crimes are extremely rare in literature. To contribute to the knowledge base in this area, we present a scalable machine learning strategy for detecting and isolating ORC listings on a prominent marketplace platform by merchants committing organized retail crimes or fraud. We employ a supervised learning approach to classify postings as fraudulent or real based on past data from buyer and seller behaviors and transactions on the platform. The proposed framework combines bespoke data preprocessing procedures, feature selection methods, and state-of-the-art class asymmetry resolution techniques to search for aligned classification algorithms capable of discriminating between fraudulent and legitimate listings in this context. Our best detection model obtains a recall score of 0.97 on the holdout set and 0.94 on the out-of-sample testing data set. We achieve these results based on a select set of 45 features out of 58.publishersversionpublishe

    Phishing email detection using Natural Language Processing techniques : a literature survey

    Get PDF
    Phishing is the most prevalent method of cybercrime that convinces people to provide sensitive information; for instance, account IDs, passwords, and bank details. Emails, instant messages, and phone calls are widely used to launch such cyber-attacks. Despite constant updating of the methods of avoiding such cyber-attacks, the ultimate outcome is currently inadequate. On the other hand, phishing emails have increased exponentially in recent years, which suggests a need for more effective and advanced methods to counter them. Numerous methods have been established to filter phishing emails, but the problem still needs a complete solution. To the best of our knowledge, this is the first survey that focuses on using Natural Language Processing (NLP) and Machine Learning (ML) techniques to detect phishing emails. This study provides an analysis of the numerous state-of-the-art NLP strategies currently in use to identify phishing emails at various stages of the attack, with an emphasis on ML strategies. These approaches are subjected to a comparative assessment and analysis. This gives a sense of the problem, its immediate solution space, and the expected future research directions