11 research outputs found
Web-Based Benchmark for Keystroke Dynamics Biometric Systems: A Statistical Analysis
Most keystroke dynamics studies have been evaluated using a specific kind of
dataset in which users type an imposed login and password. Moreover, these
studies are optimistics since most of them use different acquisition protocols,
private datasets, controlled environment, etc. In order to enhance the accuracy
of keystroke dynamics' performance, the main contribution of this paper is
twofold. First, we provide a new kind of dataset in which users have typed both
an imposed and a chosen pairs of logins and passwords. In addition, the
keystroke dynamics samples are collected in a web-based uncontrolled
environment (OS, keyboards, browser, etc.). Such kind of dataset is important
since it provides us more realistic results of keystroke dynamics' performance
in comparison to the literature (controlled environment, etc.). Second, we
present a statistical analysis of well known assertions such as the
relationship between performance and password size, impact of fusion schemes on
system overall performance, and others such as the relationship between
performance and entropy. We put into obviousness in this paper some new results
on keystroke dynamics in realistic conditions.Comment: The Eighth International Conference on Intelligent Information Hiding
and Multimedia Signal Processing (IIHMSP 2012), Piraeus : Greece (2012
Fast computation of the performance evaluation of biometric systems: application to multibiometric
The performance evaluation of biometric systems is a crucial step when
designing and evaluating such systems. The evaluation process uses the Equal
Error Rate (EER) metric proposed by the International Organization for
Standardization (ISO/IEC). The EER metric is a powerful metric which allows
easily comparing and evaluating biometric systems. However, the computation
time of the EER is, most of the time, very intensive. In this paper, we propose
a fast method which computes an approximated value of the EER. We illustrate
the benefit of the proposed method on two applications: the computing of non
parametric confidence intervals and the use of genetic algorithms to compute
the parameters of fusion functions. Experimental results show the superiority
of the proposed EER approximation method in term of computing time, and the
interest of its use to reduce the learning of parameters with genetic
algorithms. The proposed method opens new perspectives for the development of
secure multibiometrics systems by speeding up their computation time.Comment: Future Generation Computer Systems (2012
Analysis of Cloud Based Keystroke Dynamics for Behavioral Biometrics Using Multiclass Machine Learning
With the rapid proliferation of interconnected devices and the exponential growth of data stored in the cloud, the potential attack surface for cybercriminals expands significantly. Behavioral biometrics provide an additional layer of security by enabling continuous authentication and real-time monitoring. Its continuous and dynamic nature offers enhanced security, as it analyzes an individual's unique behavioral patterns in real-time. In this study, we utilized a dataset consisting of 90 users' attempts to type the 11-character string 'Exponential' eight times. Each attempt was recorded in the cloud with timestamps for key press and release events, aligned with the initial key press. The objective was to explore the potential of keystroke dynamics for user authentication. Various features were extracted from the dataset, categorized into tiers. Tier-0 features included key-press time and key-release time, while Tier-1 derived features encompassed durations, latencies, and digraphs. Additionally, Tier-2 statistical measures such as maximum, minimum, and mean values were calculated. The performance of three popular multiclass machine learning models, namely Decision Tree, Multi-layer Perceptron, and LightGBM, was evaluated using these features. The results indicated that incorporating Tier-1 and Tier-2 features significantly improved the models' performance compared to relying solely on Tier-0 features. The inclusion of Tier-1 and Tier-2 features allows the models to capture more nuanced patterns and relationships in the keystroke data. While Decision Trees provide a baseline, Multi-layer Perceptron and LightGBM outperform them by effectively capturing complex relationships. Particularly, LightGBM excels in leveraging information from all features, resulting in the highest level of explanatory power and prediction accuracy. This highlights the importance of capturing both local and higher-level patterns in keystroke data to accurately authenticate users
Recommended from our members
Improving the performance of free-text keystroke dynamics authentication by fusion
Free-text keystroke dynamics is invariably hampered by the huge amount of data needed to train the system. This problem has been addressed in this paper by suggesting a system that combines two methods, both of which provide a reduced training requirement for user authentication using free-text keystrokes. The two methods were fused to achieve error rates lower than those produced by each method separately. Two fusion schemes, namely: decision-level fusion and feature-level fusion, were applied. Feature-level fusion was done by concatenating two sets of features before the learning stage. The two sets of features were: a timing feature set and a non-conventional feature set. Moreover, decision-level fusion was used to merge the output of two methods using majority voting. One is Support Vector Machines (SVMs) together with Ant Colony Optimization (ACO) feature selection and the other is decision trees (DTs). Even though the classifiers using the parameters merged at feature level produced low error rates, its results were outperformed by the results achieved by the decision-level fusion scheme. Decision-level fusion was employed to achieve the best performance of 0.00% False Accept Rate (FAR) and 0.00% False Reject Rate (FRR)
Secure entity authentication
According to Wikipedia, authentication is the act of confirming the truth of an attribute of a single piece of a datum claimed true by an entity. Specifically, entity authentication is the process by which an agent in a distributed system gains confidence in the identity of a communicating partner (Bellare et al.). Legacy password authentication is still the most popular one, however, it suffers from many limitations, such as hacking through social engineering techniques, dictionary attack or database leak. To address the security concerns in legacy password-based authentication, many new authentication factors are introduced, such as PINs (Personal Identification Numbers) delivered through out-of-band channels, human biometrics and hardware tokens. However, each of these authentication factors has its own inherent weaknesses and security limitations. For example, phishing is still effective even when using out-of-band-channels to deliver PINs (Personal Identification Numbers). In this dissertation, three types of secure entity authentication schemes are developed to alleviate the weaknesses and limitations of existing authentication mechanisms: (1) End user authentication scheme based on Network Round-Trip Time (NRTT) to complement location based authentication mechanisms; (2) Apache Hadoop authentication mechanism based on Trusted Platform Module (TPM) technology; and (3) Web server authentication mechanism for phishing detection with a new detection factor NRTT. In the first work, a new authentication factor based on NRTT is presented. Two research challenges (i.e., the secure measurement of NRTT and the network instabilities) are addressed to show that NRTT can be used to uniquely and securely identify login locations and hence can support location-based web authentication mechanisms. The experiments and analysis show that NRTT has superior usability, deploy-ability, security, and performance properties compared to the state-of-the-art web authentication factors. In the second work, departing from the Kerb eros-centric approach, an authentication framework for Hadoop that utilizes Trusted Platform Module (TPM) technology is proposed. It is proven that pushing the security down to the hardware level in conjunction with software techniques provides better protection over software only solutions. The proposed approach provides significant security guarantees against insider threats, which manipulate the execution environment without the consent of legitimate clients. Extensive experiments are conducted to validate the performance and the security properties of the proposed approach. Moreover, the correctness and the security guarantees are formally proved via Burrows-Abadi-Needham (BAN) logic. In the third work, together with a phishing victim identification algorithm, NRTT is used as a new phishing detection feature to improve the detection accuracy of existing phishing detection approaches. The state-of-art phishing detection methods fall into two categories: heuristics and blacklist. The experiments show that the combination of NRTT with existing heuristics can improve the overall detection accuracy while maintaining a low false positive rate. In the future, to develop a more robust and efficient phishing detection scheme, it is paramount for phishing detection approaches to carefully select the features that strike the right balance between detection accuracy and robustness in the face of potential manipulations. In addition, leveraging Deep Learning (DL) algorithms to improve the performance of phishing detection schemes could be a viable alternative to traditional machine learning algorithms (e.g., SVM, LR), especially when handling complex and large scale datasets
Recommended from our members
Free-text keystroke dynamics authentication with a reduced need for training and language independency
This research aims to overcome the drawback of the large amount of training data required
for free-text keystroke dynamics authentication. A new key-pairing method, which is based
on the keyboard’s key-layout, has been suggested to achieve that. The method extracts
several timing features from specific key-pairs. The level of similarity between a user’s
profile data and his or her test data is then used to decide whether the test data was provided
by the genuine user. The key-pairing technique was developed to use the smallest amount of
training data in the best way possible which reduces the requirement for typing long text in
the training stage. In addition, non-conventional features were also defined and extracted
from the input stream typed by the user in order to understand more of the users typing
behaviours. This helps the system to assemble a better idea about the user’s identity from the
smallest amount of training data. Non-conventional features compute the average of users
performing certain actions when typing a whole piece of text. Results were obtained from the
tests conducted on each of the key-pair timing features and the non-conventional features,
separately. An FAR of 0.013, 0.0104 and an FRR of 0.384, 0.25 were produced by the timing
features and non-conventional features, respectively. Moreover, the fusion of these two
feature sets was utilized to enhance the error rates. The feature-level fusion thrived to reduce
the error rates to an FAR of 0.00896 and an FRR of 0.215 whilst decision-level fusion
succeeded in achieving zero FAR and FRR. In addition, keystroke dynamics research suffers
from the fact that almost all text included in the studies is typed in English. Nevertheless, the
key-pairing method has the advantage of being language-independent. This allows for it to be
applied on text typed in other languages. In this research, the key-pairing method was applied
to text in Arabic. The results produced from the test conducted on Arabic text were similar to
those produced from English text. This proves the applicability of the key-pairing method on
a language other than English even if that language has a completely different alphabet and
characteristics. Moreover, experimenting with texts in English and Arabic produced results
showing a direct relation between the users’ familiarity with the language and the
performance of the authentication system
Biometrics
Biometrics uses methods for unique recognition of humans based upon one or more intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. The book consists of 13 chapters, each focusing on a certain aspect of the problem. The book chapters are divided into three sections: physical biometrics, behavioral biometrics and medical biometrics. The key objective of the book is to provide comprehensive reference and text on human authentication and people identity verification from both physiological, behavioural and other points of view. It aims to publish new insights into current innovations in computer systems and technology for biometrics development and its applications. The book was reviewed by the editor Dr. Jucheng Yang, and many of the guest editors, such as Dr. Girija Chetty, Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park, Dr. Sook Yoon and so on, who also made a significant contribution to the book
Identifying users using Keystroke Dynamics and contextual information
Biometric identification systems based on Keystroke Dynamics have been around for almost forty years now. There has always been a lot of interest in identifying individuals using their physiological or behavioral traits. Keystroke Dynamics focuses on the particular way a person types on a keyboard.
The objective of the proposed research is to determine how well the identity of users can be established when using this biometric trait and when contextual information is also taken into account. The proposed research focuses on free text. Users were never told what to type, how or when. This particular field of Keystroke Dynamics has not been as thoroughly studied as the fixed text alternative where a plethora of methods have been tried.
The proposed methods focus on the hypothesis that the position of a particular letter, or combination of letters, in a word is of high importance. Other studies have not taken into account if these letter combinations had occurred at the beginning, the middle, or the end of a word.
A template of the user will be built using the context of the written words and the latency between successive keystrokes. Other features, like word length, minimum number of needed words to consider a session valid, frequency of words, model building parameters, as well as age group and gender have also been studied to determine those that better help ascertain the identity of an individual.
The results of the proposed research should help determine if using Keystroke Dynamics and the proposed methodology are enough to identify users from the content they type with a good enough level of certainty. From this moment, it could be used as a method to ensure that a user is not supplanted, in authentication schemes, or even to help determine the authorship of different parts of a document written by more than one user.Els sistemes d’identificació biomètrica basades en la cadència de tecleig fa gairebé quaranta anys que s’estudien. Hi ha hagut molt interès en identificar les persones a partir de les seves caracterÃstiques fisiològiques o de comportament. La cadència de tecleig és la manera en la que una persona escriu en un teclat.
L’objectiu de la recerca proposada és determinar com de bé es pot arribar a identificar un individu mitjançant aquesta caracterÃstica biomètrica i quan també es prenen en consideració dades contextuals. Aquesta recerca es basa en text lliure. Als usuaris mai se’ls va dir què, quan o com havien d’escriure. Aquest camp de la cadència de tecleig no ha estat tan estudiat com l’alternativa de text fix on un gran ventall de mètodes s’han provat.
Els mètodes d’identificació proposats es basen en la hipòtesi que la posició d’una lletra, o combinació de lletres teclejades, en una paraula és de gran importà ncia. Altres estudis no prenen en consideració aquesta informació, és a dir, si la combinació de lletres s’ha produït al principi, al mig o al final de la paraula.
Es crearà una empremta de l’usuari tenint en compte el context de les lletres en les paraules escrites i les latències entre pulsacions successives. Altres caracterÃstiques com la mida de les paraules, el nombre mÃnim de paraules necessari per considerar una sessió và lida, la freqüència de mots, els parà metres de construcció dels models, aixà com el grup d’edat i el gènere també s’han estudiat per determinar quines són les que millor ajuden a identificar un individu.
Els resultats de la recerca proposada haurien de permetre determinar si l’ús de la cadència de tecleig i els mètodes proposats són suficients per identificar els usuaris a partir del contingut que generen, sempre amb un cert marge d’error. En cas afirmatiu es podria introduir la tècnica proposada com un mètode més per assegurar que un usuari no és suplantat, en sistemes d’autenticació, o fins i tot per ajudar a determinar l’autoria de diferents parts d’un document que ha estat escrit per més d’un usuari