53 research outputs found
Analysis of Cloud Based Keystroke Dynamics for Behavioral Biometrics Using Multiclass Machine Learning
With the rapid proliferation of interconnected devices and the exponential growth of data stored in the cloud, the potential attack surface for cybercriminals expands significantly. Behavioral biometrics provide an additional layer of security by enabling continuous authentication and real-time monitoring. Its continuous and dynamic nature offers enhanced security, as it analyzes an individual's unique behavioral patterns in real-time. In this study, we utilized a dataset consisting of 90 users' attempts to type the 11-character string 'Exponential' eight times. Each attempt was recorded in the cloud with timestamps for key press and release events, aligned with the initial key press. The objective was to explore the potential of keystroke dynamics for user authentication. Various features were extracted from the dataset, categorized into tiers. Tier-0 features included key-press time and key-release time, while Tier-1 derived features encompassed durations, latencies, and digraphs. Additionally, Tier-2 statistical measures such as maximum, minimum, and mean values were calculated. The performance of three popular multiclass machine learning models, namely Decision Tree, Multi-layer Perceptron, and LightGBM, was evaluated using these features. The results indicated that incorporating Tier-1 and Tier-2 features significantly improved the models' performance compared to relying solely on Tier-0 features. The inclusion of Tier-1 and Tier-2 features allows the models to capture more nuanced patterns and relationships in the keystroke data. While Decision Trees provide a baseline, Multi-layer Perceptron and LightGBM outperform them by effectively capturing complex relationships. Particularly, LightGBM excels in leveraging information from all features, resulting in the highest level of explanatory power and prediction accuracy. This highlights the importance of capturing both local and higher-level patterns in keystroke data to accurately authenticate users
Combined scaled manhattan distance and mean of hornerâs rules for keystroke dynamic authentication
Account security was determined by how well the security techniques applied by the system were used. There had been many security methods that guaranteed the security of their accounts, one of which was Keystroke Dynamic Authentication. Keystroke Dynamic Authentication was an authentication technique that utilized the typing habits of a person as a security measurement tool for the user account. From several research, the average use in the Keystroke Dynamic Authentication classification is not suitable, because a user's typing speed will change over time, maybe faster or slower depending on certain conditions. So, in this research, we proposed a combination of the Scaled Manhattan Distance method and the Mean of Horner's Rules as a classification method between the user and attacker against the Keystroke Dynamic Authentication. The reason for using Mean of Hornerâs Rules can adapt to changes in values over time and based on the results can improve the accuracy of the previous method
Touch-screen Behavioural Biometrics on Mobile Devices
Robust user verification on mobile devices is one of the top priorities globally from a financial security and privacy viewpoint and has led to biometric verification complementing or replacing PIN and password methods. Research has shown that behavioural biometric methods, with their promise of improved security due to inimitable nature and the lure of unintrusive, implicit, continuous verification, could define the future of privacy and cyber security in an increasingly mobile world. Considering the real-life nature of problems relating to mobility, this study aims to determine the impact of user interaction factors that affect verification performance and usability for behavioural biometric modalities on mobile devices. Building on existing work on biometric performance assessments, it asks: To what extent does the biometric performance remain stable when faced with movements or change of environment, over time and other device related factors influencing usage of mobile devices in real-life applications? Further it seeks to provide answers to: What could further improve the performance for behavioural biometric modalities?
Based on a review of the literature, a series of experiments were executed to collect a dataset consisting of touch dynamics based behavioural data mirroring various real-life usage scenarios of a mobile device. Responses were analysed using various uni-modal and multi-modal frameworks. Analysis demonstrated that existing verification methods using touch modalities of swipes, signatures and keystroke dynamics adapt poorly when faced with a variety of usage scenarios and have challenges related to time persistence. The results indicate that a multi-modal solution does have a positive impact towards improving the verification performance. On this basis, it is recommended to explore alternatives in the form of dynamic, variable thresholds and smarter template selection strategy which hold promise. We believe that the evaluation results presented in this thesis will streamline development of future solutions for improving the security of behavioural-based modalities on mobile biometrics
Machine Learning Models for Educational Platforms
Scaling up education online and onlife is presenting numerous key challenges, such as hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely. However, thanks to the wider availability of learning-related data and increasingly higher performance computing, Artificial Intelligence has the potential to turn such challenges into an unparalleled opportunity. One of its sub-fields, namely Machine Learning, is enabling machines to receive data and learn for themselves, without being programmed with rules. Bringing this intelligent support to education at large scale has a number of advantages, such as avoiding manual error-prone tasks and reducing the chance that learners do any misconduct. Planning, collecting, developing, and predicting become essential steps to make it concrete into real-world education.
This thesis deals with the design, implementation, and evaluation of Machine Learning models in the context of online educational platforms deployed at large scale. Constructing and assessing the performance of intelligent models is a crucial step towards increasing reliability and convenience of such an educational medium. The contributions result in large data sets and high-performing models that capitalize on Natural Language Processing, Human Behavior Mining, and Machine Perception. The model decisions aim to support stakeholders over the instructional pipeline, specifically on content categorization, content recommendation, learnersâ identity verification, and learnersâ sentiment analysis. Past research in this field often relied on statistical processes hardly applicable at large scale. Through our studies, we explore opportunities and challenges introduced by Machine Learning for the above goals, a relevant and timely topic in literature.
Supported by extensive experiments, our work reveals a clear opportunity in combining human and machine sensing for researchers interested in online education. Our findings illustrate the feasibility of designing and assessing Machine Learning models for categorization, recommendation, authentication, and sentiment prediction in this research area. Our results provide guidelines on model motivation, data collection, model design, and analysis techniques concerning the above applicative scenarios. Researchers can use our findings to improve data collection on educational platforms, to reduce bias in data and models, to increase model effectiveness, and to increase the reliability of their models, among others. We expect that this thesis can support the adoption of Machine Learning models in educational platforms even more, strengthening the role of data as a precious asset. The thesis outputs are publicly available at https://www.mirkomarras.com
Activity-Based User Authentication Using Smartwatches
Smartwatches, which contain an accelerometer and gyroscope, have recently been used to implement gait and gesture- based biometrics; however, the prior studies have long-established drawbacks. For example, data for both training and evaluation was captured from single sessions (which is not realistic and can lead to overly optimistic performance results), and in cases when the multi-day scenario was considered, the evaluation was often either done improperly or the results are very poor (i.e., greater than 20% of EER). Moreover, limited activities were considered (i.e., gait or gestures), and data captured within a controlled environment which tends to be far less realistic for real world applications. Therefore, this study remedies these past problems by training and evaluating the smartwatch-based biometric system on data from different days, using large dataset that involved the participation of 60 users, and considering different activities (i.e., normal walking (NW), fast walking (FW), typing on a PC keyboard (TypePC), playing mobile game (GameM), and texting on mobile (TypeM)). Unlike the prior art that focussed on simply laboratory controlled data, a more realistic dataset, which was captured within un-constrained environment, is used to evaluate the performance of the proposed system.
Two principal experiments were carried out focusing upon constrained and un-constrained environments. The first experiment included a comprehensive analysis of the aforementioned activities and tested under two different scenarios (i.e., same and cross day). By using all the extracted features (i.e., 88 features) and the same day evaluation, EERs of the acceleration readings were 0.15%, 0.31%, 1.43%, 1.52%, and 1.33% for the NW, FW, TypeM, TypePC, and GameM respectively. The EERs were increased to 0.93%, 3.90%, 5.69%, 6.02%, and 5.61% when the cross-day data was utilized. For comparison, a more selective set of features was used and significantly maximize the system performance under the cross day scenario, at best EERs of 0.29%, 1.31%, 2.66%, 3.83%, and 2.3% for the aforementioned activities respectively.
A realistic methodology was used in the second experiment by using data collected within unconstrained environment. A light activity detection approach was developed to divide the raw signals into gait (i.e., NW and FW) and stationary activities. Competitive results were reported with EERs of 0.60%, 0% and 3.37% for the NW, FW, and stationary activities respectively. The findings suggest that the nature of the signals captured are sufficiently discriminative to be useful in performing transparent and continuous user authentication.University of Kuf
Recommended from our members
Online semi-supervised learning in non-stationary environments
Existing Data Stream Mining (DSM) algorithms assume the availability of labelled and
balanced data, immediately or after some delay, to extract worthwhile knowledge from the
continuous and rapid data streams. However, in many real-world applications such as
Robotics, Weather Monitoring, Fraud Detection Systems, Cyber Security, and Computer
Network Traffic Flow, an enormous amount of high-speed data is generated by Internet of
Things sensors and real-time data on the Internet. Manual labelling of these data streams
is not practical due to time consumption and the need for domain expertise. Another
challenge is learning under Non-Stationary Environments (NSEs), which occurs due to
changes in the data distributions in a set of input variables and/or class labels. The problem
of Extreme Verification Latency (EVL) under NSEs is referred to as Initially Labelled Non-Stationary Environment (ILNSE). This is a challenging task because the learning algorithms
have no access to the true class labels directly when the concept evolves. Several approaches
exist that deal with NSE and EVL in isolation. However, few algorithms address both issues
simultaneously. This research directly responds to ILNSEâs challenge in proposing two
novel algorithms âPredictor for Streaming Data with Scarce Labelsâ (PSDSL) and
Heterogeneous Dynamic Weighted Majority (HDWM) classifier. PSDSL is an Online Semi-Supervised Learning (OSSL) method for real-time DSM and is closely related to label
scarcity issues in online machine learning.
The key capabilities of PSDSL include learning from a small amount of labelled data in an
incremental or online manner and being available to predict at any time. To achieve this,
PSDSL utilises both labelled and unlabelled data to train the prediction models, meaning it
continuously learns from incoming data and updates the model as new labelled or
unlabelled data becomes available over time. Furthermore, it can predict under NSE
conditions under the scarcity of class labels. PSDSL is built on top of the HDWM classifier,
which preserves the diversity of the classifiers. PSDSL and HDWM can intelligently switch
and adapt to the conditions. The PSDSL adapts to learning states between self-learning,
micro-clustering and CGC, whichever approach is beneficial, based on the characteristics of
the data stream. HDWM makes use of âseedâ learners of different types in an ensemble to
maintain its diversity. The ensembles are simply the combination of predictive models
grouped to improve the predictive performance of a single classifier.
PSDSL is empirically evaluated against COMPOSE, LEVELIW, SCARGC and MClassification
on benchmarks, NSE datasets as well as Massive Online Analysis (MOA) data streams and real-world datasets. The results showed that PSDSL performed significantly better than
existing approaches on most real-time data streams including randomised data instances.
PSDSL performed significantly better than âStaticâ i.e. the classifier is not updated after it is
trained with the first examples in the data streams. When applied to MOA-generated data
streams, PSDSL ranked highest (1.5) and thus performed significantly better than SCARGC,
while SCARGC performed the same as the Static. PSDSL achieved better average prediction
accuracies in a short time than SCARGC.
The HDWM algorithm is evaluated on artificial and real-world data streams against existing
well-known approaches such as the heterogeneous WMA and the homogeneous Dynamic
DWM algorithm. The results showed that HDWM performed significantly better than WMA
and DWM. Also, when recurring concept drifts were present, the predictive performance of
HDWM showed an improvement over DWM. In both drift and real-world streams,
significance tests and post hoc comparisons found significant differences between
algorithms, HDWM performed significantly better than DWM and WMA when applied to
MOA data streams and 4 real-world datasets Electric, Spam, Sensor and Forest cover. The
seeding mechanism and dynamic inclusion of new base learners in the HDWM algorithms
benefit from the use of both forgetting and retaining the models. The algorithm also
provides the independence of selecting the optimal base classifier in its ensemble depending
on the problem.
A new approach, Envelope-Clustering is introduced to resolve the cluster overlap conflicts
during the cluster labelling process. In this process, PSDSL transforms the centroidsâ
information of micro-clusters into micro-instances and generates new clusters called
Envelopes. The nearest envelope clusters assist the conflicted micro-clusters and
successfully guide the cluster labelling process after the concept drifts in the absence of true
class labels. PSDSL has been evaluated on real-world problem âkeystroke dynamicsâ, and
the results show that PSDSL achieved higher prediction accuracy (85.3%) and SCARGC
(81.6%), while the Static (49.0%) significantly degrades the performance due to changes in
the users typing pattern. Furthermore, the predictive accuracies of SCARGC are found
highly fluctuated between (41.1% to 81.6%) based on different values of parameter âkâ
(number of clusters), while PSDSL automatically determine the best values for this
parameter
- âŠ