Search CORE

9 research outputs found

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Author: Alaiz Rodríguez Rocío
Alegre Gutiérrez Enrique
González Castro Víctor
Jáñez-Martino Francisco
López Fidalgo Eduardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/06/2022
Field of study

.Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.SIPublicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

Leon University (Spain)

Quantification-oriented learning based on reliable classifiers

Author: Alaiz-Rodríguez
Alaiz-Rodríguez
Barandela
Barranquero
Bradley
Chang
Demšar
Drummond
Duda
Esuli
Fawcett
Fawcett
Fleiss
Forman
Guerrero-Curieses
Hand
Holte
Jorge Díez
Jose Barranquero
Juan José del Coz
Moreno-Torres
Provost
Rakthanmanon
Ramentol
Saerens
van Rijsbergen
Webb
Weiss
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Categorical Change: Exploring the Effects of Concept Drift in Human Perceptual Category Learning

Author: Wismer Andrew
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2018
Field of study

Categorization is an essential survival skill that we engage in daily. A multitude of behavioral and neuropsychological evidence support the existence of multiple learning systems involved in category learning. COmpetition between Verbal and Implicit Systems (COVIS) theory provides a neuropsychological basis for the existence of an explicit and implicit learning system involved in the learning of category rules. COVIS provides a convincing account of asymptotic performance in human category learning. However, COVIS – and virtually all current theories of category learning – focus solely on categories and decision environments that remain stationary over time. However, our environment is dynamic, and we often need to adapt our decision making to account for environmental or categorical changes. Machine learning addresses this significant challenge through what is termed concept drift. Concept drift occurs any time a data distribution changes over time. This dissertation draws from two key characteristics of concept drift in machine learning known to impact the performance of learning models, and in-so-doing provides the first systematic exploration of concept drift (i.e., categorical change) in human perceptual category learning. Four experiments, each including one key change parameter (category base-rates, payoffs, or category structure [RB/II]), investigated the effect of rate of change (abrupt, gradual) and awareness of change (foretold or not) on decision criterion adaptation. Critically, Experiments 3 and 4 evaluated differences in categorical adaptation within explicit and implicit category learning tasks to determine if rate and awareness of change moderated any learning system differences. The results of these experiments inform current category learning theory and provide information for machine learning models of decision support in non-stationary environments

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Evaluating Classifiers During Dataset Shift

Author: Fritsch Corey
Publication venue: UWM Digital Commons
Publication date: 01/05/2023
Field of study

Deployment of a classifier into a machine learning application likely begins with training different types of algorithms on a subset of the available historical data and then evaluating them on datasets that are drawn from identical distributions. The goal of this evaluation process is to select the classifier that is believed to be most robust in maintaining good future performance, and then deploy that classifier to end-users who use it to make predictions on new data. Often times, predictive models are deployed in conditions that differ from those used in training, meaning that dataset shift occurred. In these situations, there are no guarantees that predictions made by the predictive model in deployment will still be as reliable and accurate as they were during the training of the model. This study demonstrated a technique that can be utilized by others when selecting a classifier for deployment, as well as the first comparative study that evaluates machine learning classifier performance on synthetic datasets with different levels of prior-probability, covariate, and concept dataset shifts. The results from this study showed the impact of dataset shift on the performance of different classifiers for two real-world datasets related to teacher retention in Wisconsin and detecting fraud in testing, as well as demonstrated a framework that can be used by others when selecting a classifier for deployment. By using the methods from this study as a proactive approach to evaluate classifiers on synthetic dataset shift, different classifiers would have been considered for deployment of both predictive models, compared to only using evaluation datasets that were drawn from identical distributions. The results from both real-world datasets also showed that there was no classifier that dealt well with prior-probability shift and that classifiers were affected less by covariate and concept shift than was expected. Two supplemental demonstrations of the methodology showed that it can be extended for additional purposes of evaluating classifiers on dataset shift. Results from analyzing the effects of hyperparameter choices on classifier performance under dataset shift, as well as the effects of actual dataset shift on classifier performance, showed that different hyperparameter configurations have an impact on the performance of a classifier in general, but can also have an impact on how robust that classifier might be to dataset shift

University of Wisconsin-Milwaukee

Assessing the Impact of Changing Environments on Classifier Performance

Author: C. Drummond
D.J. Hand
F. Provost
I.H. Witten
J. Huang
K. Yamazaki
M. Saerens
Publication venue
Publication date: 01/01/2008
Field of study

Abstract. The purpose of this paper is to test the hypothesis that simple classifiers are more robust to changing environments than complex ones. We propose a strategy for generating artificial, but realistic domains, which allows us to control the changing environment and test a variety of situations. Our results suggest that evaluating classifiers on such tasks is not straightforward since the changed environment can yield a simpler or more complex domain. We propose a metric capable of taking this issue into consideration and evaluate our classifiers using it. We conclude that in mild cases of population drifts simple classifiers deteriorate more than complex ones and that in more severe cases as well as in class definition changes, all classifiers deteriorate to about the same extent. This means that in all cases, complex classifiers remain more accurate than simpler ones, thus challenging the hypothesis that simple classifiers are more robust to changing environments than complex ones. 1 Introduction

CiteSeerX

Crossref