249 research outputs found

    Human activity recognition with self-attention

    Get PDF
    In this paper, a self-attention based neural network architecture to address human activity recognition is proposed. The dataset used was collected using smartphone. The contribution of this paper is using a multi-layer multi-head self-attention neural network architecture for human activity recognition and compared to two strong baseline architectures, which are convolutional neural network (CNN) and long-short term network (LSTM). The dropout rate, positional encoding and scaling factor are also been investigated to find the best model. The results show that proposed model achieves a test accuracy of 91.75%, which is a comparable result when compared to both the baseline models

    Dense & Attention Convolutional Neural Networks for Toe Walking Recognition

    Get PDF
    Idiopathic toe walking (ITW) is a gait disorder where children’s initial contacts show limited or no heel touch during the gait cycle. Toe walking can lead to poor balance, increased risk of falling or tripping, leg pain, and stunted growth in children. Early detection and identification can facilitate targeted interventions for children diagnosed with ITW. This study proposes a new one-dimensional (1D) Dense & Attention convolutional network architecture, which is termed as the DANet, to detect idiopathic toe walking. The dense block is integrated into the network to maximize information transfer and avoid missed features. Further, the attention modules are incorporated into the network to highlight useful features while suppressing unwanted noises. Also, the Focal Loss function is enhanced to alleviate the imbalance sample issue. The proposed approach outperforms other methods and obtains a superior performance. It achieves a test recall of 88.91% for recognizing idiopathic toe walking on the local dataset collected from real-world experimental scenarios. To ensure the scalability and generalizability of the proposed approach, the algorithm is further validated through the publicly available datasets, and the proposed approach achieves an average precision, recall, and F1-Score of 89.34%, 91.50%, and 92.04%, respectively. Experimental results present a competitive performance and demonstrate the validity and feasibility of the proposed approach

    Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities

    Full text link
    The vast proliferation of sensor devices and Internet of Things enables the applications of sensor-based activity recognition. However, there exist substantial challenges that could influence the performance of the recognition system in practical scenarios. Recently, as deep learning has demonstrated its effectiveness in many areas, plenty of deep methods have been investigated to address the challenges in activity recognition. In this study, we present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets that can be used for evaluation in different challenge tasks. We then propose a new taxonomy to structure the deep methods by challenges. Challenges and challenge-related deep methods are summarized and analyzed to form an overview of the current research progress. At the end of this work, we discuss the open issues and provide some insights for future directions

    Tiny Deep Learning Architectures Enabling Sensor-Near Acoustic Data Processing and Defect Localization

    Get PDF
    The timely diagnosis of defects at their incipient stage of formation is crucial to extending the life-cycle of technical appliances. This is the case of mechanical-related stress, either due to long aging degradation processes (e.g., corrosion) or in-operation forces (e.g., impact events), which might provoke detrimental damage, such as cracks, disbonding or delaminations, most commonly followed by the release of acoustic energy. The localization of these sources can be successfully fulfilled via adoption of acoustic emission (AE)-based inspection techniques through the computation of the time of arrival (ToA), namely the time at which the induced mechanical wave released at the occurrence of the acoustic event arrives to the acquisition unit. However, the accurate estimation of the ToA may be hampered by poor signal-to-noise ratios (SNRs). In these conditions, standard statistical methods typically fail. In this work, two alternative deep learning methods are proposed for ToA retrieval in processing AE signals, namely a dilated convolutional neural network (DilCNN) and a capsule neural network for ToA (CapsToA). These methods have the additional benefit of being portable on resource-constrained microprocessors. Their performance has been extensively studied on both synthetic and experimental data, focusing on the problem of ToA identification for the case of a metallic plate. Results show that the two methods can achieve localization errors which are up to 70% more precise than those yielded by conventional strategies, even when the SNR is severely compromised (i.e., down to 2 dB). Moreover, DilCNN and CapsNet have been implemented in a tiny machine learning environment and then deployed on microcontroller units, showing a negligible loss of performance with respect to offline realizations

    Improving Deep Learning for HAR with shallow LSTMs

    Full text link
    Recent studies in Human Activity Recognition (HAR) have shown that Deep Learning methods are able to outperform classical Machine Learning algorithms. One popular Deep Learning architecture in HAR is the DeepConvLSTM. In this paper we propose to alter the DeepConvLSTM architecture to employ a 1-layered instead of a 2-layered LSTM. We validate our architecture change on 5 publicly available HAR datasets by comparing the predictive performance with and without the change employing varying hidden units within the LSTM layer(s). Results show that across all datasets, our architecture consistently improves on the original one: Recognition performance increases up to 11.7% for the F1-score, and our architecture significantly decreases the amount of learnable parameters. This improvement over DeepConvLSTM decreases training time by as much as 48%. Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data. Based on our results we argue that said claim might not be applicable to sensor-based HAR.Comment: 6 pages, 2 figures, accepted at ISWC 21: International Symposium on Wearable Computer, Sept, 202

    Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

    Full text link
    Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model proved to be more accurate and robust compared to fully-supervised methods on low data regimes.Comment: Accepted IEEE Transactions On Affective Computin

    Deep Multi Temporal Scale Networks for Human Motion Analysis

    Get PDF
    The movement of human beings appears to respond to a complex motor system that contains signals at different hierarchical levels. For example, an action such as ``grasping a glass on a table'' represents a high-level action, but to perform this task, the body needs several motor inputs that include the activation of different joints of the body (shoulder, arm, hand, fingers, etc.). Each of these different joints/muscles have a different size, responsiveness, and precision with a complex non-linearly stratified temporal dimension where every muscle has its temporal scale. Parts such as the fingers responds much faster to brain input than more voluminous body parts such as the shoulder. The cooperation we have when we perform an action produces smooth, effective, and expressive movement in a complex multiple temporal scale cognitive task. Following this layered structure, the human body can be described as a kinematic tree, consisting of joints connected. Although it is nowadays well known that human movement and its perception are characterised by multiple temporal scales, very few works in the literature are focused on studying this particular property. In this thesis, we will focus on the analysis of human movement using data-driven techniques. In particular, we will focus on the non-verbal aspects of human movement, with an emphasis on full-body movements. The data-driven methods can interpret the information in the data by searching for rules, associations or patterns that can represent the relationships between input (e.g. the human action acquired with sensors) and output (e.g. the type of action performed). Furthermore, these models may represent a new research frontier as they can analyse large masses of data and focus on aspects that even an expert user might miss. The literature on data-driven models proposes two families of methods that can process time series and human movement. The first family, called shallow models, extract features from the time series that can help the learning algorithm find associations in the data. These features are identified and designed by domain experts who can identify the best ones for the problem faced. On the other hand, the second family avoids this phase of extraction by the human expert since the models themselves can identify the best set of features to optimise the learning of the model. In this thesis, we will provide a method that can apply the multi-temporal scales property of the human motion domain to deep learning models, the only data-driven models that can be extended to handle this property. We will ask ourselves two questions: what happens if we apply knowledge about how human movements are performed to deep learning models? Can this knowledge improve current automatic recognition standards? In order to prove the validity of our study, we collected data and tested our hypothesis in specially designed experiments. Results support both the proposal and the need for the use of deep multi-scale models as a tool to better understand human movement and its multiple time-scale nature

    Multi-Modal retinal image analysis via deep learning for the diagnosis of intermediate dry age-related macular degeneration: a feasibility study

    Get PDF
    Background and Objective. To determine if using a multi-input deep learning approach in the image analysis of optical coherence tomography (OCT), OCT angiography (OCT-A), and colour fundus photographs increases the accuracy of a CNN to diagnose intermediate dry age-related macular degeneration (AMD). Patients and Methods. Seventy-five participants were recruited and divided into three cohorts: young healthy (YH), old healthy (OH), and patients with intermediate dry AMD. Colour fundus photography, OCT, and OCT-A scans were performed. The convolutional neural network (CNN) was trained on multiple image modalities at the same time. Results. The CNN trained using OCT alone showed a diagnostic accuracy of 94%, whilst the OCT-A trained CNN resulted in an accuracy of 91%. When multiple modalities were combined, the CNN accuracy increased to 96% in the AMD cohort. Conclusions. Here we demonstrate that superior diagnostic accuracy can be achieved when deep learning is combined with multimodal image analysis
    • …
    corecore