Bot recognition in a Web store: An approach based on unsupervised learning

Francesco Masulli; Grażyna Suchacka; Stefano Rovetta

Bot recognition in a Web store: An approach based on unsupervised learning

Authors: Francesco Masulli
Grażyna Suchacka
Stefano Rovetta
Publication date: 1 January 2020
Publisher
Doi

Abstract

Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Open Access Repository

oai:zenodo.org:37302

Last time updated on 21/07/2023

Archivio istituzionale della ricerca - Università di Genova

oai:iris.unige.it:11567/999600

Last time updated on 17/03/2020