This work proposes a data driven learning model for the synthesis of
keystroke biometric data. The proposed method is compared with two statistical
approaches based on Universal and User-dependent models. These approaches are
validated on the bot detection task, using the keystroke synthetic data to
improve the training process of keystroke-based bot detection systems. Our
experimental framework considers a dataset with 136 million keystroke events
from 168 thousand subjects. We have analyzed the performance of the three
synthesis approaches through qualitative and quantitative experiments.
Different bot detectors are considered based on several supervised classifiers
(Support Vector Machine, Random Forest, Gaussian Naive Bayes and a Long
Short-Term Memory network) and a learning framework including human and
synthetic samples. The experiments demonstrate the realism of the synthetic
samples. The classification results suggest that in scenarios with large
labeled data, these synthetic samples can be detected with high accuracy.
However, in few-shot learning scenarios it represents an important challenge.
Furthermore, these results show the great potential of the presented models.Comment: Paper accepted in IEEE Computer Society Workshop on Biometrics
(CVPRw) 202