458 research outputs found

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    On the Recognition of Emotion from Physiological Data

    Get PDF
    This work encompasses several objectives, but is primarily concerned with an experiment where 33 participants were shown 32 slides in order to create ‗weakly induced emotions‘. Recordings of the participants‘ physiological state were taken as well as a self report of their emotional state. We then used an assortment of classifiers to predict emotional state from the recorded physiological signals, a process known as Physiological Pattern Recognition (PPR). We investigated techniques for recording, processing and extracting features from six different physiological signals: Electrocardiogram (ECG), Blood Volume Pulse (BVP), Galvanic Skin Response (GSR), Electromyography (EMG), for the corrugator muscle, skin temperature for the finger and respiratory rate. Improvements to the state of PPR emotion detection were made by allowing for 9 different weakly induced emotional states to be detected at nearly 65% accuracy. This is an improvement in the number of states readily detectable. The work presents many investigations into numerical feature extraction from physiological signals and has a chapter dedicated to collating and trialing facial electromyography techniques. There is also a hardware device we created to collect participant self reported emotional states which showed several improvements to experimental procedure

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Deep Active Learning Explored Across Diverse Label Spaces

    Get PDF
    abstract: Deep learning architectures have been widely explored in computer vision and have depicted commendable performance in a variety of applications. A fundamental challenge in training deep networks is the requirement of large amounts of labeled training data. While gathering large quantities of unlabeled data is cheap and easy, annotating the data is an expensive process in terms of time, labor and human expertise. Thus, developing algorithms that minimize the human effort in training deep models is of immense practical importance. Active learning algorithms automatically identify salient and exemplar samples from large amounts of unlabeled data and can augment maximal information to supervised learning models, thereby reducing the human annotation effort in training machine learning models. The goal of this dissertation is to fuse ideas from deep learning and active learning and design novel deep active learning algorithms. The proposed learning methodologies explore diverse label spaces to solve different computer vision applications. Three major contributions have emerged from this work; (i) a deep active framework for multi-class image classication, (ii) a deep active model with and without label correlation for multi-label image classi- cation and (iii) a deep active paradigm for regression. Extensive empirical studies on a variety of multi-class, multi-label and regression vision datasets corroborate the potential of the proposed methods for real-world applications. Additional contributions include: (i) a multimodal emotion database consisting of recordings of facial expressions, body gestures, vocal expressions and physiological signals of actors enacting various emotions, (ii) four multimodal deep belief network models and (iii) an in-depth analysis of the effect of transfer of multimodal emotion features between source and target networks on classification accuracy and training time. These related contributions help comprehend the challenges involved in training deep learning models and motivate the main goal of this dissertation.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Single-Trial EEG Classification with EEGNet and Neural Structured Learning for Improving BCI Performance

    Get PDF
    Research and development of new machine learning techniques to augment the performance of Brain-computer Interfaces (BCI) have always been an open area of interest among researchers. The need to develop robust and generalised classifiers has been one of the vital requirements in BCI for realworld application. EEGNet is a compact CNN model that had been reported to be generalised for different BCI paradigms. In this paper, we have aimed at further improving the EEGNet architecture by employing Neural Structured Learning (NSL) that taps into the relational information within the data to regularise the training of the neural network. This would allow the EEGNet to make better predictions while maintaining the structural similarity of the input. In addition to better performance, the combination of EEGNet and NSL is more robust, works well with smaller training samples and requires on separate feature engineering, thus saving the computational cost. The proposed approach had been tested on two standard motor imagery datasets: the first being a two-class motor imagery dataset from Graz University and the second is the 4-class Dataset 2a from BCI competition 2008. The accuracy has shown that our combined EEGNet an NSL approach is superior to the sole EEGNet model

    A brain-computer interface for navigation in virtual reality

    Full text link
    L'interface cerveau-ordinateur (ICO) dĂ©code les signaux Ă©lectriques du cerveau requise par l’électroencĂ©phalographie et transforme ces signaux en commande pour contrĂŽler un appareil ou un logiciel. Un nombre limitĂ© de tĂąches mentales ont Ă©tĂ© dĂ©tectĂ©s et classifier par diffĂ©rents groupes de recherche. D’autres types de contrĂŽle, par exemple l’exĂ©cution d'un mouvement du pied, rĂ©el ou imaginaire, peut modifier les ondes cĂ©rĂ©brales du cortex moteur. Nous avons utilisĂ© un ICO pour dĂ©terminer si nous pouvions faire une classification entre la navigation de type marche avant et arriĂšre, en temps rĂ©el et en temps diffĂ©rĂ©, en utilisant diffĂ©rentes mĂ©thodes. Dix personnes en bonne santĂ© ont participĂ© Ă  l’expĂ©rience sur les ICO dans un tunnel virtuel. L’expĂ©rience fut a Ă©tait divisĂ© en deux sĂ©ances (48 min chaque). Chaque sĂ©ance comprenait 320 essais. On a demandĂ© au sujets d’imaginer un dĂ©placement avant ou arriĂšre dans le tunnel virtuel de façon alĂ©atoire d’aprĂšs une commande Ă©crite sur l'Ă©cran. Les essais ont Ă©tĂ© menĂ©s avec feedback. Trois Ă©lectrodes ont Ă©tĂ© montĂ©es sur le scalp, vis-Ă -vis du cortex moteur. Durant la 1re sĂ©ance, la classification des deux taches (navigation avant et arriĂšre) a Ă©tĂ© rĂ©alisĂ©e par les mĂ©thodes de puissance de bande, de reprĂ©sentation temporel-frĂ©quence, des modĂšles autorĂ©gressifs et des rapports d’asymĂ©trie du rythme ÎČ avec classificateurs d’analyse discriminante linĂ©aire et SVM. Les seuils ont Ă©tĂ© calculĂ©s en temps diffĂ©rĂ© pour former des signaux de contrĂŽle qui ont Ă©tĂ© utilisĂ©s en temps rĂ©el durant la 2e sĂ©ance afin d’initier, par les ondes cĂ©rĂ©brales de l'utilisateur, le dĂ©placement du tunnel virtuel dans le sens demandĂ©. AprĂšs 96 min d'entrainement, la mĂ©thode « online biofeedback » de la puissance de bande a atteint une prĂ©cision de classification moyenne de 76 %, et la classification en temps diffĂ©rĂ© avec les rapports d’asymĂ©trie et puissance de bande, a atteint une prĂ©cision de classification d’environ 80 %.A Brain-Computer Interface (BCI) decodes the brain signals representing a desire to do something, and transforms those signals into a control command. However, only a limited number of mental tasks have been previously detected and classified. Performing a real or imaginary navigation movement can similarly change the brainwaves over the motor cortex. We used an ERS-BCI to see if we can classify between movements in forward and backward direction offline and then online using different methods. Ten healthy people participated in BCI experiments comprised two-sessions (48 min each) in a virtual environment tunnel. Each session consisted of 320 trials where subjects were asked to imagine themselves moving in the tunnel in a forward or backward motion after a randomly presented (forward versus backward) command on the screen. Three EEG electrodes were mounted bilaterally on the scalp over the motor cortex. Trials were conducted with feedback. In session 1, Band Power method, Time-frequency representation, Autoregressive models and asymmetry ratio were used in the ÎČ rhythm range with a Linear-Discriminant-analysis classifier and a Support Vector Machine classifier to discriminate between the two mental tasks. Thresholds for both tasks were computed offline and then used to form control signals that were used online in session 2 to trigger the virtual tunnel to move in the direction requested by the user's brain signals. After 96 min of training, the online band-power biofeedback training achieved an average classification precision of 76 %, whereas the offline classification with asymmetrical ratio and band-power achieved an average classification precision of 80%

    Transparent Authentication Utilising Gait Recognition

    Get PDF
    Securing smartphones has increasingly become inevitable due to their massive popularity and significant storage and access to sensitive information. The gatekeeper of securing the device is authenticating the user. Amongst the many solutions proposed, gait recognition has been suggested to provide a reliable yet non-intrusive authentication approach – enabling both security and usability. While several studies exploring mobile-based gait recognition have taken place, studies have been mainly preliminary, with various methodological restrictions that have limited the number of participants, samples, and type of features; in addition, prior studies have depended on limited datasets, actual controlled experimental environments, and many activities. They suffered from the absence of real-world datasets, which lead to verify individuals incorrectly. This thesis has sought to overcome these weaknesses and provide, a comprehensive evaluation, including an analysis of smartphone-based motion sensors (accelerometer and gyroscope), understanding the variability of feature vectors during differing activities across a multi-day collection involving 60 participants. This framed into two experiments involving five types of activities: standard, fast, with a bag, downstairs, and upstairs walking. The first experiment explores the classification performance in order to understand whether a single classifier or multi-algorithmic approach would provide a better level of performance. The second experiment investigated the feature vector (comprising of a possible 304 unique features) to understand how its composition affects performance and for a comparison a more particular set of the minimal features are involved. The controlled dataset achieved performance exceeded the prior work using same and cross day methodologies (e.g., for the regular walk activity, the best results EER of 0.70% and EER of 6.30% for the same and cross day scenarios respectively). Moreover, multi-algorithmic approach achieved significant improvement over the single classifier approach and thus a more practical approach to managing the problem of feature vector variability. An Activity recognition model was applied to the real-life gait dataset containing a more significant number of gait samples employed from 44 users (7-10 days for each user). A human physical motion activity identification modelling was built to classify a given individual's activity signal into a predefined class belongs to. As such, the thesis implemented a novel real-world gait recognition system that recognises the subject utilising smartphone-based real-world dataset. It also investigates whether these authentication technologies can recognise the genuine user and rejecting an imposter. Real dataset experiment results are offered a promising level of security particularly when the majority voting techniques were applied. As well as, the proposed multi-algorithmic approach seems to be more reliable and tends to perform relatively well in practice on real live user data, an improved model employing multi-activity regarding the security and transparency of the system within a smartphone. Overall, results from the experimentation have shown an EER of 7.45% for a single classifier (All activities dataset). The multi-algorithmic approach achieved EERs of 5.31%, 6.43% and 5.87% for normal, fast and normal and fast walk respectively using both accelerometer and gyroscope-based features – showing a significant improvement over the single classifier approach. Ultimately, the evaluation of the smartphone-based, gait authentication system over a long period of time under realistic scenarios has revealed that it could provide a secured and appropriate activities identification and user authentication system

    Emotion and Stress Recognition Related Sensors and Machine Learning Technologies

    Get PDF
    This book includes impactful chapters which present scientific concepts, frameworks, architectures and ideas on sensing technologies and machine learning techniques. These are relevant in tackling the following challenges: (i) the field readiness and use of intrusive sensor systems and devices for capturing biosignals, including EEG sensor systems, ECG sensor systems and electrodermal activity sensor systems; (ii) the quality assessment and management of sensor data; (iii) data preprocessing, noise filtering and calibration concepts for biosignals; (iv) the field readiness and use of nonintrusive sensor technologies, including visual sensors, acoustic sensors, vibration sensors and piezoelectric sensors; (v) emotion recognition using mobile phones and smartwatches; (vi) body area sensor networks for emotion and stress studies; (vii) the use of experimental datasets in emotion recognition, including dataset generation principles and concepts, quality insurance and emotion elicitation material and concepts; (viii) machine learning techniques for robust emotion recognition, including graphical models, neural network methods, deep learning methods, statistical learning and multivariate empirical mode decomposition; (ix) subject-independent emotion and stress recognition concepts and systems, including facial expression-based systems, speech-based systems, EEG-based systems, ECG-based systems, electrodermal activity-based systems, multimodal recognition systems and sensor fusion concepts and (x) emotion and stress estimation and forecasting from a nonlinear dynamical system perspective

    Automatic Speech Emotion Recognition- Feature Space Dimensionality and Classification Challenges

    Get PDF
    In the last decade, research in Speech Emotion Recognition (SER) has become a major endeavour in Human Computer Interaction (HCI), and speech processing. Accurate SER is essential for many applications, like assessing customer satisfaction with quality of services, and detecting/assessing emotional state of children in care. The large number of studies published on SER reflects the demand for its use. The main concern of this thesis is the investigation of SER from a pattern recognition and machine learning points of view. In particular, we aim to identify appropriate mathematical models of SER and examine the process of designing automatic emotion recognition schemes. There are major challenges to automatic SER including ambiguity about the list/definition of emotions, the lack of agreement on a manageable set of uncorrelated speech-based emotion relevant features, and the difficulty of collected emotion-related datasets under natural circumstances. We shall initiate our work by dealing with the identification of appropriate sets of emotion related features/attributes extractible from speech signals as considered from psychological and computational points of views. We shall investigate the use of pattern-recognition approaches to remove redundancies and achieve compactification of digital representation of the extracted data with minimal loss of information. The thesis will include the design of new or complement existing SER schemes and conduct large sets of experiments to empirically test their performances on different databases, identify advantages, and shortcomings of using speech alone for emotion recognition. Existing SER studies seem to deal with the ambiguity/dis-agreement on a “limited” number of emotion-related features by expanding the list from the same speech signal source/sites and apply various feature selection procedures as a mean of reducing redundancies. Attempts are made to discover more relevant features to emotion from speech. One of our investigations focuses on proposing a newly sets of features for SER, extracted from Linear Predictive (LP)-residual speech. We shall demonstrate the usefulness of the proposed relatively small set of features by testing the performance of an SER scheme that is based on fusing our set of features with the existing set of thousands of features using common machine learning schemes of Support Vector Machine (SVM) and Artificial Neural Network (ANN). The challenge of growing dimensionality of SER feature space and its impact on increased model complexity is another major focus of our research project. By studying the pros and cons of the commonly used feature selection approaches, we argued in favour of meta-feature selection and developed various methods in this direction, not only to reduce dimension, but also to adapt and de-correlate emotional feature spaces for improved SER model recognition accuracy. We used rincipal Component Analysis (PCA) and proposed Data Independent PCA (DIPCA) by training on independent emotional and non-emotional datasets. The DIPCA projections, especially when extracted from speech data coloured with different emotions or from Neutral speech data, had comparable capability to the PCA in terms of SER performance. Another adopted approach in this thesis for dimension reduction is the Random Projection (RP) matrices, independent of training data. We have shown that some versions of RP with SVM classifier can offer an adaptation space for Speaker Independent SER that avoid over-fitting and hence improves recognition accuracy. Using PCA trained on a set of data, while testing on emotional data features, has significant implication for machine learning in general. The thesis other major contribution focuses on the classification aspects of SER. We investigate the drawbacks of the well-known SVM classifier when applied to a preprocessed data by PCA and RP. We shall demonstrate the advantages of using the Linear Discriminant Classifier (LDC) instead especially for PCA de-correlated metafeatures. We initiated a variety of LDC-based ensembles classification, to test performance of scheme using a new form of bagging different subsets of metafeature subsets extracted by PCA with encouraging results. The experiments conducted were applied on two benchmark datasets (Emo-Berlin and FAU-Aibo), and an in-house dataset in the Kurdish language. Recognition accuracy achieved by are significantly higher than the state of art results on all datasets. The results, however, revealed a difficult challenge in the form of persisting wide gap in accuracy over different datasets, which cannot be explained entirely by the differences between the natures of the datasets. We conducted various pilot studies that were based on various visualizations of the confusion matrices for the “difficult” databases to build multi-level SER schemes. These studies provide initial evidences to the presence of more than one “emotion” in the same portion of speech. A possible solution may be through presenting recognition accuracy in a score-based measurement like the spider chart. Such an approach may also reveal the presence of Doddington zoo phenomena in SER
    • 

    corecore