23 research outputs found

    Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition

    Get PDF
    Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation

    Multi-modal perception and sensor fusion for human-robot collaboration

    Get PDF
    In an era where human-robot collaboration is becoming increasingly common. sensor-based approaches are emerging as critical factors in enabling robots to work harmoniously alongside humans. In some scenarios, the utilization of multiple sensors becomes imperative due to the complexity of certain tasks. This multi-modal approach enhances cognitive capabilities and ensures higher levels of safety and reliability. The integration of diverse sensor modalities not only improves the accuracy of perception but also enables robots to adapt to surrounding environments and analyze human intentions and commands more effectively. This paper explores sensor fusion and multi-modal approaches within an industrial context. The proposed fusion framework adopts a holistic approach encompassing multiple modalities, integrating object detection, human gesture recognition, and speech recognition techniques. These components are pivotal for human-robot interaction, enabling robots to comprehend their environment and interpret human inputs efficiently. Each modality is individually tested and evaluated within the sensor fusion and multi-modal framework to ensure efficient functionality. Successful industrial experimentation underscores the practicality and relevance of sensor fusion and multi-modality approaches. This work underscores the potential of sensor fusion while emphasizing the ongoing need for exploration and improvement in this field

    A review of current neuromorphic approaches for vision, auditory, and olfactory sensors

    Get PDF
    Conventional vision, auditory, and olfactory sensors generate large volumes of redundant data and as a result tend to consume excessive power. To address these shortcomings, neuromorphic sensors have been developed. These sensors mimic the neuro-biological architecture of sensory organs using aVLSI (analog Very Large Scale Integration) and generate asynchronous spiking output that represents sensing information in ways that are similar to neural signals. This allows for much lower power consumption due to an ability to extract useful sensory information from sparse captured data. The foundation for research in neuromorphic sensors was laid more than two decades ago, but recent developments in understanding of biological sensing and advanced electronics, have stimulated research on sophisticated neuromorphic sensors that provide numerous advantages over conventional sensors. In this paper, we review the current state-of-the-art in neuromorphic implementation of vision, auditory, and olfactory sensors and identify key contributions across these fields. Bringing together these key contributions we suggest a future research direction for further development of the neuromorphic sensing field

    Investigating multisensory integration in emotion recognition through bio-inspired computational models

    Get PDF
    Emotion understanding represents a core aspect of human communication. Our social behaviours are closely linked to expressing our emotions and understanding others emotional and mental states through social signals. The majority of the existing work proceeds by extracting meaningful features from each modality and applying fusion techniques either at a feature level or decision level. However, these techniques are incapable of translating the constant talk and feedback between different modalities. Such constant talk is particularly important in continuous emotion recognition, where one modality can predict, enhance and complement the other. This paper proposes three multisensory integration models, based on different pathways of multisensory integration in the brain; that is, integration by convergence, early cross-modal enhancement, and integration through neural synchrony. The proposed models are designed and implemented using third-generation neural networks, Spiking Neural Networks (SNN). The models are evaluated using widely adopted, third-party datasets and compared to state-of-the-art multimodal fusion techniques, such as early, late and deep learning fusion. Evaluation results show that the three proposed models have achieved comparable results to the state-of-the-art supervised learning techniques. More importantly, this paper demonstrates plausible ways to translate constant talk between modalities during the training phase, which also brings advantages in generalisation and robustness to noise.PostprintPeer reviewe

    Deep Learning for Processing Electromyographic Signals: a Taxonomy-based Survey

    Get PDF
    Deep Learning (DL) has been recently employed to build smart systems that perform incredibly well in a wide range of tasks, such as image recognition, machine translation, and self-driving cars. In several fields the considerable improvement in the computing hardware and the increasing need for big data analytics has boosted DL work. In recent years physiological signal processing has strongly benefited from deep learning. In general, there is an exponential increase in the number of studies concerning the processing of electromyographic (EMG) signals using DL methods. This phenomenon is mostly explained by the current limitation of myoelectric controlled prostheses as well as the recent release of large EMG recording datasets, e.g. Ninapro. Such a growing trend has inspired us to seek and review recent papers focusing on processing EMG signals using DL methods. Referring to the Scopus database, a systematic literature search of papers published between January 2014 and March 2019 was carried out, and sixty-five papers were chosen for review after a full text analysis. The bibliometric research revealed that the reviewed papers can be grouped in four main categories according to the final application of the EMG signal analysis: Hand Gesture Classification, Speech and Emotion Classification, Sleep Stage Classification and Other Applications. The review process also confirmed the increasing trend in terms of published papers, the number of papers published in 2018 is indeed four times the amount of papers published the year before. As expected, most of the analyzed papers (≈60 %) concern the identification of hand gestures, thus supporting our hypothesis. Finally, it is worth reporting that the convolutional neural network (CNN) is the most used topology among the several involved DL architectures, in fact, the sixty percent approximately of the reviewed articles consider a CNN

    Light-weight federated learning with augmented knowledge distillation for human activity recognition

    Get PDF
    The field of deep learning has experienced significant growth in recent years in various domains where data can be collected and processed. However, as data plays a central role in the deep learning revolution, there are risks associated with moving the data from where it is produced to central servers and data centers for processing. To address this issue, Federated Learning (FL) was introduced as a framework for collaboratively training a global model on distributed data. However, deploying FL comes with several unique challenges, including communication overhead and system and statistical heterogeneity. While FL is inherently private as clients don’t share local data, privacy is still a concern in the FL context since sensitive data can be leaked from the exchanged gradients. To address these challenges, this thesis proposes the incorporation of techniques such as Knowledge Distillation (KD) and Differential Privacy (DP) with FL. Specifically, a modelagnostic FL algorithm based on KD is proposed, called the Federated Learning algorithm based on Knowledge Distillation (FedAKD). FedAKD utilizes a shared dataset as a proxy dataset to calculate and transfer knowledge in the form of soft labels, which are then sent to the server for aggregation and broadcast back to clients to train on them in addition to local training. Additionally, we elaborate on applying Local Differential Privacy (LDP) where clients apply gradient clipping and noise injection according to the Differentially Private Stochastic Gradient Descent (DP-SGD). The FedAKD algorithm is evaluated utilizing Human Activity Recognition (HAR) datasets in terms of accuracy and communication efficiency

    Short-term motion prediction of autonomous vehicles in complex environments: A Deep Learning approach

    Get PDF
    Complex environments manifest a high level of complexity and it is of critical importance that the safety systems embedded within autonomous vehicles (AVs) are able to accurately anticipate short-term future motion of agents in close proximity. This problem can be further understood as generating a sequence of coordinates describing the plausible future motion of the tracked agent. Number of recently proposed techniques that present satisfactory performance exploit the learning capabilities of novel deep learning (DL) architectures to tackle the discussed task. Nonetheless, there still exists a vast number of challenging issues that must be resolved to further advance capabilities of motion prediction models.This thesis explores novel deep learning techniques within the area of short-term motion prediction of on-road participants, specifically other vehicles from a points of autonomous vehicles. First and foremost, various approaches in the literature demonstrate significant benefits of using a rasterised top-down image of the road to encode the context of tracked vehicle’s surroundings which generally encapsulates a large, global portion of the environment. This work on the other hand explores a use of local regions of the rasterised map to more explicitly focus on the encoding of the tracked vehicle’s state. The proposed technique demonstrates plausible results against several baseline models and in addition outperforms the same model that instead uses global maps. Next, the typical method for extracting features from rasterised maps involves employing one of the popular vision models (e.g. ResNet-50) that has been previously pre-trained on a distinct task such as image classification. Recently however, it has been demonstrated that this approach can be sub-optimal for tasks that strongly rely on precise localisation of features and it can be more advantageous to train the model from scratch directly on the task at hand. In contrast, the subsequent part of this thesis investigates an alternative method for processing and encoding of spatial data based on the capsule networks in order to eradicate several issues that standard vision models exhibit. Through several experiments it is established that the novel capsule based motion predictor that is trained from scratch is able to achieve competitive results against numerous popular vision models. Finally, the proposed model is further extended with the use of generative framework to account for the fact that the space of possible movements of the tracked vehicle is not strictly limited to single trajectory. More specifically, to account for the multi-modality of the problem a conditional variational auto-encoder (CVAE) is employed which enables to sample an arbitrary amount of diverse trajectories. The final model is examined against methods from literature on a publicly available dataset and as presented it significantly outperforms other models whilst drastically reducing the number of trainable parameters

    Multimodaalinen käyttöliittymä interaktiivista yhteistyötä varten nelijalkaisten robottien kanssa

    Get PDF
    A variety of approaches for hand gesture recognition have been proposed, where most interest has recently been directed towards different deep learning methods. The modalities, on which these approaches are based, most commonly range from different imaging sensors to inertial measurement units (IMU) and electromyography (EMG) sensors. EMG and IMUs allow detection of gestures without being affected by the line of sight or lighting conditions. The detection algorithms are fairly well established, but their application to real world use cases is limited, apart from prostheses and exoskeletons. In this thesis, a multimodal interface for human robot interaction (HRI) is developed for quadruped robots. The interface is based on a combination of two detection algorithms; one for detecting gestures based on surface electromyography (sEMG) and IMU signals, and the other for detecting the operator using visible light and depth cameras. Multiple architectures for gesture detection are compared, where the best regression performance with offline multi-user data was achieved by a hybrid of a convolutional neural network (CNN) and a long short-term memory (LSTM), with a mean squared error (MSE) of 4.7 · 10−3 in the normalised gestures. A person-following behaviour is implemented for a quadruped robot, which is controlled using the predefined gestures. The complete interface is evaluated online by one expert user two days after recording the last samples of the training data. The gesture detection system achieved an F-score of 0.95 for the gestures alone, and 0.90, when unrecognised attempts due to other technological aspects, such as disturbances in Bluetooth data transmission, are included. The system to reached online performance levels comparable to those reported for offline sessions and online sessions with real-time visual feedback. While the current interface was successfully deployed to the robot, further advances should be aimed at improving inter-subject performance and wireless communication reliability between the devices.Käden eleiden tunnistamiseksi on ehdotettu useita vaihtoehtoisia ratkaisuja, mutta tällä hetkellä tutkimus- ja kehitystyö on pääasiassa keskittynyt erilaisiin syvän oppimisen menetelmiin. Hyödynnetyt teknologiat vaihtelevat useimmiten kuvantavista antureista inertiamittausyksiköihin (inertial measurement unit, IMU) ja lihassähkökäyrää (electromyography, EMG) mittaaviin antureihin. EMG ja IMU:t mahdollistavat eleiden tunnistuksen riippumatta näköyhteydestä tai valaistusolosuhteista. Eleiden tunnistukseen käytettävät menetelmät ovat jo melko vakiintuneita, mutta niiden käyttökohteet ovat rajoittuneet lähinnä proteeseihin ja ulkoisiin tukirankoihin. Tässä opinnäytetyössä kehitettiin useaa modaliteettia hyödyntävä käyttöliittymä ihmisen ja robotin vuorovaikutusta varten. Käyttöliittymä perustuu kahden menetelmän yhdistelmään, joista ensimmäinen vastaa eleiden tunnistuksesta pohjautuen ihon pinnalta mitattavaan EMG:hen ja IMU-signaaleihin, ja toinen käyttäjän tunnistuksesta näkyvän valon- ja syvyyskameroiden perusteella. Työssä vertaillaan useita eleiden tunnistuksen soveltuvia arkkitehtuureja, joista parhaan tuloksen usean käyttäjän opetusaineistolla saavutti konvoluutineuroverkon (convolutional neural network, CNN) ja pitkäkestoisen lyhytkestomuistin (long short-term memory, LSTM) yhdistelmäarkkitehtuuri. Normalisoitujen eleiden regression keskimääräinen neliöllinen virhe (mean squared error, MSE) oli tällä arkkitehtuurilla 4,7·10−3. Eleitä hyödynnettiin robotille toteutetun henkilön seuraamistehtävän ohjaamisessa. Lopullinen käyttöliittymä arvioitiin yhdellä kokeneella koehenkilöllä kaksi päivää viimeisten eleiden mittaamisen jälkeen. Tällöin eleiden tunnistusjärjestelmä saavutti F-testiarvon 0,95, kun vain eleiden tunnistuksen kyvykkyys huomioitiin. Arvioitaessa koko järjestelmän toimivuutta saavutettiin F-testiarvo 0,90, jossa muun muassa Bluetooth-pohjainen tiedonsiirto heikensi tuloksia. Suoraan robottiin yhteydessä ollessaan, järjestelmän saavuttama eleiden tunnistuskyky vastasi laboratorioissa suoritettujen kokeiden suorituskykyä. Vaikka järjestelmän toiminta vahvistettiin onnistuneesti, tulee tutkimuksen jatkossa keskittyä etenkin ihmisten välisen yleistymisen parantamiseen, sekä langattoman tiedonsiirron ongelmien korjaamiseen

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

    Radar based discrete and continuous activity recognition for assisted living

    Get PDF
    In an era of digital transformation, there is an appetite for automating the monitoring process of motions and actions by individuals who are part of a society increasingly getting older on average. ”Activity recognition” is where sensors use motion information from participants who are wearing a wearable sensor or are in the field of view of a remote sensor which, coupled with machine learning algorithms, can automatically identify the movement or action the person is undertaking. Radar is a nascent sensor for this application, having been proposed in literature as an effective privacy-compliant sensor that can track movements of the body effectively. The methods of recording movements are separated into two types where ’Discrete’ movements provide an overview of a single activity within a fixed interval of time, while ’Continuous’ activities present sequences of activities performed in a series with variable duration and uncertain transitions, making these a challenging and yet much more realistic classification problem. In this thesis, first an overview of the technology of continuous wave (CW) and frequency modulated continuous wave (FMCW) radars and the machine learning algorithms and classification concepts is provided. Following this, state of the art for activity recognition with radar is presented and the key papers and significant works are discussed. The remaining chapters of this thesis discuss the research topics where contributions were made. This is commenced through analysing the effect of the physiology of the subject under test, to show that age can have an effect on the radar readings on the target. This is followed by porting existing radar recognition technologies and presenting novel use of radar based gait recognition to detect lameness in animals. Reverting to the human-centric application, improvements to activity recognition on humans and its accuracy was demonstrated by utilising features from different domains with feature selection and using different sensing technologies cooperatively. Finally, using a Bi-long short term memory (LSTM) based network, improved recognition of continuous activities and activity transitions without human-dependent feature extraction was demonstrated. Accuracy rate of 97% was achieved through sensor fusion and feature selection for discrete activities and for continuous activities, the Bi-LSTM achieved 92% accuracy with a sole radar sensor
    corecore