283 research outputs found

    DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for HMER

    Full text link
    The task of recognising Handwritten Mathematical Expressions (HMER) is crucial in the fields of digital education and scholarly research. However, it is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions. In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER, where the encoder has a Bottleneck Attention Module (BAM) to improve feature representation and the decoder has a Gated Input-GRU (GI-GRU) unit with an extra gate to make decoding long and complex expressions easier. The proposed model is an efficient and lightweight architecture with performance equivalent to state-of-the-art models in terms of Expression Recognition Rate (exprate). It also performs better in terms of top 1, 2, and 3 error accuracy across the CROHME 2014, 2016, and 2019 datasets. DenseBAM-GI achieves the best exprate among all models on the CROHME 2019 dataset. Importantly, these successes are accomplished with a drop in the complexity of the calculation and a reduction in the need for GPU memory

    Automatic interpretation of clock drawings for computerised assessment of dementia

    Get PDF
    The clock drawing test (CDT) is a standard neurological test for detection of cognitive impairment. A computerised version of the test has potential to improve test accessibility and accuracy. CDT sketch interpretation is one of the first stages in the analysis of the computerised test. It produces a set of recognised digits and symbols together with their positions on the clock face. Subsequently, these are used in the test scoring. This is a challenging problem because the average CDT taker has a high likelihood of cognitive impairment, and writing is one of the first functional activities to be affected. Current interpretation systems perform less well on this kind of data due to its unintelligibility. In this thesis, a novel automatic interpretation system for CDT sketch is proposed and developed. The proposed interpretation system and all the related algorithms developed in this thesis are evaluated using a CDT data set collected for this study. This data consist of two sets, the first set consisting of 65 drawings made by healthy people, and the second consisting of 100 drawings reproduced from drawings of dementia patients. This thesis has four main contributions. The first is a conceptual model of the proposed CDT sketch interpretation system based on integrating prior knowledge of the expected CDT sketch structure and human reasoning into the drawing interpretation system. The second is a novel CDT sketch segmentation algorithm based on supervised machine learning and a new set of temporal and spatial features automatically extracted from the CDT data. The evaluation of the proposed method shows that it outperforms the current state-of-the-art method for CDT drawing segmentation. The third contribution is a new v handwritten digit recognition algorithm based on a set of static and dynamic features extracted from handwritten data. The algorithm combines two classifiers, fuzzy k-nearest neighbour’s classifier with a Convolutional Neural Network (CNN), which take advantage both of static and dynamic data representation. The proposed digit recognition algorithm is shown to outperform each classifier individually in terms of recognition accuracy. The final contribution of this study is the probabilistic Situational Bayesian Network (SBN), which is a new hierarchical probabilistic model for addressing the problem of fusing diverse data sources, such as CDT sketches created by healthy volunteers and dementia patients, in a probabilistic Bayesian network. The evaluation of the proposed SBN-based CDT sketch interpretation system on CDT data shows highly promising results, with 100% recognition accuracy for heathy CDT drawings and 97.15% for dementia data. To conclude, the proposed automatic CDT sketch interpretation system shows high accuracy in terms of recognising different sketch objects and thus paves the way for further research in dementia and clinical computer-assisted diagnosis of dementia

    Transparent Authentication Utilising Gait Recognition

    Get PDF
    Securing smartphones has increasingly become inevitable due to their massive popularity and significant storage and access to sensitive information. The gatekeeper of securing the device is authenticating the user. Amongst the many solutions proposed, gait recognition has been suggested to provide a reliable yet non-intrusive authentication approach – enabling both security and usability. While several studies exploring mobile-based gait recognition have taken place, studies have been mainly preliminary, with various methodological restrictions that have limited the number of participants, samples, and type of features; in addition, prior studies have depended on limited datasets, actual controlled experimental environments, and many activities. They suffered from the absence of real-world datasets, which lead to verify individuals incorrectly. This thesis has sought to overcome these weaknesses and provide, a comprehensive evaluation, including an analysis of smartphone-based motion sensors (accelerometer and gyroscope), understanding the variability of feature vectors during differing activities across a multi-day collection involving 60 participants. This framed into two experiments involving five types of activities: standard, fast, with a bag, downstairs, and upstairs walking. The first experiment explores the classification performance in order to understand whether a single classifier or multi-algorithmic approach would provide a better level of performance. The second experiment investigated the feature vector (comprising of a possible 304 unique features) to understand how its composition affects performance and for a comparison a more particular set of the minimal features are involved. The controlled dataset achieved performance exceeded the prior work using same and cross day methodologies (e.g., for the regular walk activity, the best results EER of 0.70% and EER of 6.30% for the same and cross day scenarios respectively). Moreover, multi-algorithmic approach achieved significant improvement over the single classifier approach and thus a more practical approach to managing the problem of feature vector variability. An Activity recognition model was applied to the real-life gait dataset containing a more significant number of gait samples employed from 44 users (7-10 days for each user). A human physical motion activity identification modelling was built to classify a given individual's activity signal into a predefined class belongs to. As such, the thesis implemented a novel real-world gait recognition system that recognises the subject utilising smartphone-based real-world dataset. It also investigates whether these authentication technologies can recognise the genuine user and rejecting an imposter. Real dataset experiment results are offered a promising level of security particularly when the majority voting techniques were applied. As well as, the proposed multi-algorithmic approach seems to be more reliable and tends to perform relatively well in practice on real live user data, an improved model employing multi-activity regarding the security and transparency of the system within a smartphone. Overall, results from the experimentation have shown an EER of 7.45% for a single classifier (All activities dataset). The multi-algorithmic approach achieved EERs of 5.31%, 6.43% and 5.87% for normal, fast and normal and fast walk respectively using both accelerometer and gyroscope-based features – showing a significant improvement over the single classifier approach. Ultimately, the evaluation of the smartphone-based, gait authentication system over a long period of time under realistic scenarios has revealed that it could provide a secured and appropriate activities identification and user authentication system

    Semantic Communications for Speech Transmission

    Get PDF
    Wireless communication systems have undergone vigorous advancements from the first generation (1G) to the fifth generation (5G) over the past few decades by developing numerous coding algorithms and channel models to recover accurate sources at the bit level. However, in recent years, the flourishing of artificial intelligence (AI) has revolutionised various industries and incubated multifarious intelligent tasks, which increases the amount of data transmission to the zetta-byte level and requires massive machine connectivity with low transmission latency and energy consumption. In this context, conventional communication systems are facing severe challenges imposed by ubiquitous AI tasks. Therefore, it is inevitable to develop a new communication paradigm. Semantic communications have been proposed to address the challenges by extracting semantic information inherent in source data while omitting irrelative redundant information to reduce the transmission data, thereby lowering communication resources and facilitating high semantic fidelity transmission. Nevertheless, the exploration of semantic communications has gone through decades of stagnation since it was first identified because of the inadequacy of mathematical models for semantic information. Inspired by the thriving of AI, deep learning (DL)-enabled semantic communications have been scrutinised as promising solutions to the bottlenecks in conventional communications by leveraging the learning and fitting capabilities of neural networks to bypass mathematical models for semantic extraction and representation. To this end, this thesis explores DL-enabled semantic communications for speech transmission to tackle technical problems in conventional speech communication networks, including semantic-agnostic coding algorithms, unreliable speech transmission in complicated channel environments, single system output limited to the speech modality, and speech quality susceptible to external interferences. Specifically, a general semantic communication system for speech transmission over single-input single-output (SISO) channels, named DeepSC-S, is first developed to reconstruct speech information by transmitting global semantic features. In addition, the system output is extended to multimodal data across different languages by introducing a task-oriented semantic communication framework for speech transmission, named DeepSC-ST, to perform various downstream intelligent tasks, including speech recognition, speech synthesis, speech-to-text translation (S2TT), and speech-to-speech translation (S2ST). Moreover, the endeavours towards semantic communications for speech transmission over multiple-input multiple-output (MIMO) channels are carried out to contend with practical communication scenarios, and a semantic-aware network is devised to improve the performance of intelligent tasks. Furthermore, the realistic scenarios involving corrupted speech input due to external inferences are further considered by establishing a semantic impairment suppression mechanism to compensate for impaired semantics in the corrupted speech and to facilitate robust end-to-end (E2E) semantic communications for speech-to-text translation. The proposed DeepSC-S and its variants investigated in this thesis demonstrate high proficiency in semantic communications for speech transmission by reducing substantial transmission data, performing diverse semantic tasks, providing superior system performance, and tolerating dynamic channel effects

    Neuromorphic nanophotonic systems for artificial intelligence

    Get PDF
    Over the last decade, we have witnessed an astonishing pace of development in the field of artificial intelligence (AI), followed by proliferation of AI algorithms into virtually every domain of our society. While modern AI models boast impressive performance, they also require massive amounts of energy and resources for operation. This is further fuelling the research into AI-specific, optimised computing hardware. At the same time, the remarkable energy efficiency of the brain brings an interesting question: Can we further borrow from the working principles of biological intelligence to realise a more efficient artificial intelligence? This can be considered as the main research question in the field of neuromorphic engineering. Thanks to the developments in AI and recent advancements in the field of photonics and photonic integration, research into light-powered implementations of neuromorphic hardware has recently experienced a significant uptick of interest. In such hardware, the aim is to seize some of the highly desirable properties of photonics not just for communication, but also to perform computation. Neurons in the brain frequently process information (compute) and communicate using action potentials, which are brief voltage spikes that encode information in the temporal domain. Similar dynamical behaviour can be elicited in some photonic devices, at speeds multiple orders of magnitude higher. Such devices with the capability of neuron-like spiking are of significant research interest for the field of neuromorphic photonics. Two distinct types of such excitable, spiking systems operating with optical signals are studied and investigated in this thesis. First, a vertical cavity surface emitting laser (VCSEL) can be operated under a specific set of conditions to realise a high-speed, all-optical excitable photonic neuron that operates at standard telecom wavelengths. The photonic VCSEL-neuron was dynamically characterised and various information encoding mechanisms were studied in this device. In particular, a spiking rate-coding regime of operation was experimentally demonstrated, and its viability for performing spiking domain conversion of digital images was explored. Furthermore, for the first time, a joint architecture utilising a VCSEL-neuron coupled to a photonic integrated circuit (PIC) silicon microring weight bank was experimentally demonstrated in two different functional layouts. Second, an optoelectronic (O/E/O) circuit based upon a resonant tunnelling diode (RTD) was introduced. Two different types of RTD devices were studied experimentally: a higher output power, µ-scale RTD that was RF coupled to an active photodetector and a VCSEL (this layout is referred to as a PRL node); and a simplified, photosensitive RTD with nanoscale injector that was RF coupled to a VCSEL (referred to as a nanopRL node). Hallmark excitable behaviours were studied in both devices, including excitability thresholding and refractory periods. Furthermore, a more exotic resonate and-fire dynamical behaviour was also reported in the nano-pRL device. Finally, a modular numerical model of the RTD was introduced, and various information processing methods were demonstrated using both a single RTD spiking node, as well as a perceptron-type spiking neural network with physical models of optoelectronic RTD nodes serving as artificial spiking neurons.Over the last decade, we have witnessed an astonishing pace of development in the field of artificial intelligence (AI), followed by proliferation of AI algorithms into virtually every domain of our society. While modern AI models boast impressive performance, they also require massive amounts of energy and resources for operation. This is further fuelling the research into AI-specific, optimised computing hardware. At the same time, the remarkable energy efficiency of the brain brings an interesting question: Can we further borrow from the working principles of biological intelligence to realise a more efficient artificial intelligence? This can be considered as the main research question in the field of neuromorphic engineering. Thanks to the developments in AI and recent advancements in the field of photonics and photonic integration, research into light-powered implementations of neuromorphic hardware has recently experienced a significant uptick of interest. In such hardware, the aim is to seize some of the highly desirable properties of photonics not just for communication, but also to perform computation. Neurons in the brain frequently process information (compute) and communicate using action potentials, which are brief voltage spikes that encode information in the temporal domain. Similar dynamical behaviour can be elicited in some photonic devices, at speeds multiple orders of magnitude higher. Such devices with the capability of neuron-like spiking are of significant research interest for the field of neuromorphic photonics. Two distinct types of such excitable, spiking systems operating with optical signals are studied and investigated in this thesis. First, a vertical cavity surface emitting laser (VCSEL) can be operated under a specific set of conditions to realise a high-speed, all-optical excitable photonic neuron that operates at standard telecom wavelengths. The photonic VCSEL-neuron was dynamically characterised and various information encoding mechanisms were studied in this device. In particular, a spiking rate-coding regime of operation was experimentally demonstrated, and its viability for performing spiking domain conversion of digital images was explored. Furthermore, for the first time, a joint architecture utilising a VCSEL-neuron coupled to a photonic integrated circuit (PIC) silicon microring weight bank was experimentally demonstrated in two different functional layouts. Second, an optoelectronic (O/E/O) circuit based upon a resonant tunnelling diode (RTD) was introduced. Two different types of RTD devices were studied experimentally: a higher output power, µ-scale RTD that was RF coupled to an active photodetector and a VCSEL (this layout is referred to as a PRL node); and a simplified, photosensitive RTD with nanoscale injector that was RF coupled to a VCSEL (referred to as a nanopRL node). Hallmark excitable behaviours were studied in both devices, including excitability thresholding and refractory periods. Furthermore, a more exotic resonate and-fire dynamical behaviour was also reported in the nano-pRL device. Finally, a modular numerical model of the RTD was introduced, and various information processing methods were demonstrated using both a single RTD spiking node, as well as a perceptron-type spiking neural network with physical models of optoelectronic RTD nodes serving as artificial spiking neurons

    Dynamic Mathematics for Automated Machine Learning Techniques

    Get PDF
    Machine Learning and Neural Networks have been gaining popularity and are widely considered as the driving force of the Fourth Industrial Revolution. However, modern machine learning techniques such as backpropagation training was firmly established in 1986 while computer vision was revolutionised in 2012 with the introduction of AlexNet. Given all these accomplishments, why are neural networks still not an integral part of our society? ``Because they are difficult to implement in practice.'' I'd like to use machine learning, but I can't invest much time. The concept of Automated Machine Learning (AutoML) was first proposed by Professor Frank Hutter of the University of Freiburg. Machine learning is not simple; it requires a practitioner to have thorough understanding on the attributes of their data and the components which their model entails. AutoML is the effort to automate all tedious aspects of machine learning to form a clean data analysis pipeline. This thesis is our effort to develop and to understand ways to automate machine learning. Specifically, we focused on Recurrent Neural Networks (RNNs), Meta-Learning, and Continual Learning. We studied continual learning to enable a network to sequentially acquire skills in a dynamic environment; we studied meta-learning to understand how a network can be configured efficiently; and we studied RNNs to understand the consequences of consecutive actions. Our RNN-study focused on mathematical interpretability. We described a large variety of RNNs as one mathematical class to understand their core network mechanism. This enabled us to extend meta-learning beyond network configuration for network pruning and continual learning. This also provided insights for us to understand how a single network should be consecutively configured and led us to the creation of a simple generic patch that is compatible to several existing continual learning archetypes. This patch enhanced the robustness of continual learning techniques and allowed them to generalise data better. By and large, this thesis presented a series of extensions to enable AutoML to be made simple, efficient, and robust. More importantly, all of our methods are motivated with mathematical understandings through the lens of dynamical systems. Thus, we also increased the interpretability of AutoML concepts
    • …
    corecore