Search CORE

200 research outputs found

ReP-ETD: A Repetitive Preprocessing technique for Embedded Text Detection from images in spam emails

Author: Asha Manek S.
Chandra Mohan M.
Deepa Shenoy P.
Patnaik L.M.
Shamini D.K.
Veena Bhat H.
Venugopal K.R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Email service proves to be a convenient and powerful communication tool. As internet continues to grow, the type of information available to user has shifted from text only to multimedia enriched. Embedded text in multimedia content is one of the prevalent means for delivering messages to content viewers. With the increasing importance of emails and the incursions of internet marketers, spam has become a major problem and has given rise to unwanted mails. Spammers are continuously adopting new techniques to evade detection. Image spam is one such technique where in embedded text within images carries the main information of the spam message instead of text based spam. Currently, image spam is evaluated to be roughly 50% of all spam traffic and is still on the rise, thus a serious research issue. Filtering mails is one of the popular approaches used to block spam mails. This work proposes new model ReP-ETD (Repetitive Pre-processing technique for Embedded Text Detection) for efficiently and accurately detecting spam in email images. The performance of the proposed ReP-ETD model has been evaluated across the identified parameters and compared with other existing models. The simulation results demonstrate the effectiveness of the proposed model

ePrints@Bangalore University

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Author: Alaiz Rodríguez Rocío
Alegre Gutiérrez Enrique
González Castro Víctor
Jáñez-Martino Francisco
López Fidalgo Eduardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/06/2022
Field of study

.Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.SIPublicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

Leon University (Spain)

ReP-ETD: A Repetitive Preprocessing technique for Embedded Text Detection from images in spam emails

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Camouflages and Token Manipulations-The Changing Faces of the Nigerian Fraudulent 419 Spammers

Author: CHIEMEKE Stella Chinye
LONGE Folake Adunni
LONGE Olumide Babatope
ONIFADE Olufade F. Williams
Publication venue: 'University of Technology, Sydney (UTS)'
Publication date: 14/01/2009
Field of study

The inefficiencies of current spam filters against fraudulent (419) mails is not unrelated to the use by spammers of good-word attacks, topic drifts, parasitic spamming, wrong categorization and recategorization of electronic mails by e-mail clients and of course the fuzzy factors of greed and gullibility on the part of the recipients who responds to fraudulent spam mail offers. In this paper, we establish that mail token manipulations remain, above any other tactics, the most potent tool used by Nigerian scammers to fool statistical spam filters. While hoping that the uncovering of these manipulative evidences will prove useful in future antispam research, our findings also sensitize spam filter developers on the need to inculcate within their antispam architecture robust modules that can deal with the identified camouflages

UTS ePress

Experimental Approach Based on Ensemble and Frequent Itemsets Mining for Image Spam Filtering

Author: Abdullah Azizi
Mat Ariff Nor Azman
Nasrudin Mohammad Faidzul
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 05/02/2018
Field of study

Excessive amounts of image spam cause many problems to e-mail users. Since image spam is difficult to detect using conventional text-based spam approach, various image processing techniques have been proposed. In this paper, we present an ensemble method using frequent itemset mining (FIM) for filtering image spam. Despite the fact that FIM techniques are well established in data mining, it is not commonly used in the ensemble method. In order to obtain a good filtering performance, a SIFT descriptor is used since it is widely known as effective image descriptors. K-mean clustering is applied to the SIFT keypoints which produce a visual codebook. The bag-of-word (BOW) feature vectors for each image is generated using a hard bag-of-features (HBOF) approach. FIM descriptors are obtained from the frequent itemsets of the BOW feature vectors. We combine BOW, FIM with another three different feature selections, namely Information Gain (IG), Symmetrical Uncertainty (SU) and Chi Square (CS) with a Spatial Pyramid in an ensemble method. We have performed experiments on Dredze and SpamArchive datasets. The results show that our ensemble that uses the frequent itemsets mining has significantly outperform the traditional BOW and naive approach that combines all descriptors directly in a very large single input vector

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research

Author: Damiani Ernesto
Hamadi Hussam Al
Taher Fatma
Yeun Chan Yeob
Zhang Zhibo
Publication venue: ZU Scholars
Publication date: 05/09/2022
Field of study

This survey presents a comprehensive review of current literature on Explainable Artificial Intelligence (XAI) methods for cyber security applications. Due to the rapid development of Internet-connected systems and Artificial Intelligence in recent years, Artificial Intelligence including Machine Learning and Deep Learning has been widely utilized in the fields of cyber security including intrusion detection, malware detection, and spam filtering. However, although Artificial Intelligence-based approaches for the detection and defense of cyber attacks and threats are more advanced and efficient compared to the conventional signature-based and rule-based cyber security strategies, most Machine Learning-based techniques and Deep Learning-based techniques are deployed in the “black-box” manner, meaning that security experts and customers are unable to explain how such procedures reach particular conclusions. The deficiencies of transparencies and interpretability of existing Artificial Intelligence techniques would decrease human users’ confidence in the models utilized for the defense against cyber attacks, especially in current situations where cyber attacks become increasingly diverse and complicated. Therefore, it is essential to apply XAI in the establishment of cyber security models to create more explainable models while maintaining high accuracy and allowing human users to comprehend, trust, and manage the next generation of cyber defense mechanisms. Although there are papers reviewing Artificial Intelligence applications in cyber security areas and the vast literature on applying XAI in many fields including healthcare, financial services, and criminal justice, the surprising fact is that there are currently no survey research articles that concentrate on XAI applications in cyber security. Therefore, the motivation behind the survey is to bridge the research gap by presenting a detailed and up-to-date survey of XAI approaches applicable to issues in the cyber security field. Our work is the first to propose a clear roadmap for navigating the XAI literature in the context of applications in cyber security

ZU Scholars (Zayed University)

MAXIMUM PHISH BAIT: TOWARDS FEATURE BASED DETECTION OF PHISING USING MAXIMUM ENTROPY CLASSIFICATION TECHNIQUE

Author: ADEGUN ADEKANMI A.
ASANI EMMANUEL OLUWATOBI
Publication venue
Publication date: 01/05/2014
Field of study

Several antiphishing methods have been employed with the primary task of automatically apprehending and ruling out or preventing phishing e-mail from users’ mail stream. Phishing attacks pose great threat to internet users and the extent can be enormous if unchecked. Two major category techniques that have been shown to be useful for classifying e-mail messages automatically include the rule based method which classifies email by using a set of heuristic rules and the statistical based approach which model e-mails statistically usually under a machine learning framework. The statistical based methods have been found in literature to outperform the rule based method. This study proposes the use of the Maximum Entropy Model, a generative model and show how it can be used in antiphishing tasks. The model based feature proposed by Bergholz et al (2008) will also be adopted. This has been found to outperform basic features proposed in previous studies. An experimental comparison of our approach with other generative and non-generative classifiers is also proposed. This approach is expected to perform comparably better than others method especially in the elimination of false positives

Afe Babalola University Repository

Landmark University Repository

Hypersparse Neural Network Analysis of Large-Scale Internet Traffic

Author: Cho Kenjiro
Claffy KC
Gadepally Vijay
Kepner Jeremy
Michaleas Peter
Milechin Lauren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/07/2019
Field of study

The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data containing 50 billion packets. Utilizing a novel hypersparse neural network analysis of "video" streams of this traffic using 10,000 processors in the MIT SuperCloud reveals a new phenomena: the importance of otherwise unseen leaf nodes and isolated links in Internet traffic. Our neural network approach further shows that a two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide variety of source/destination statistics on moving sample windows ranging from 100,000 to 100,000,000 packets over collections that span years and continents. The inferred model parameters distinguish different network streams and the model leaf parameter strongly correlates with the fraction of the traffic in different underlying network topologies. The hypersparse neural network pipeline is highly adaptable and different network statistics and training models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High Performance Extreme Computing (HPEC) 201

arXiv.org e-Print Archive

Crossref