18 research outputs found

    A new spiking convolutional recurrent neural network (SCRNN) with applications to event-based hand gesture recognition

    Get PDF
    The combination of neuromorphic visual sensors and spiking neural network offers a high efficient bio-inspired solution to real-world applications. However, processing event- based sequences remains challenging because of the nature of their asynchronism and sparsity behavior. In this paper, a novel spiking convolutional recurrent neural network (SCRNN) architecture that takes advantage of both convolution operation and recurrent connectivity to maintain the spatial and temporal relations from event-based sequence data are presented. The use of recurrent architecture enables the network to have a sampling window with an arbitrary length, allowing the network to exploit temporal correlations between event collections. Rather than standard ANN to SNN conversion techniques, the network utilizes a supervised Spike Layer Error Reassignment (SLAYER) training mechanism that allows the network to adapt to neuromorphic (event-based) data directly. The network structure is validated on the DVS gesture dataset and achieves a 10 class gesture recognition accuracy of 96.59% and an 11 class gesture recognition accuracy of 90.28%

    Human-robot interaction and computer-vision-based services for autonomous robots

    Get PDF
    L'Aprenentatge per Imitació (IL), o Programació de robots per Demostració (PbD), abasta mètodes pels quals un robot aprèn noves habilitats a través de l'orientació humana i la imitació. La PbD s'inspira en la forma en què els éssers humans aprenen noves habilitats per imitació amb la finalitat de desenvolupar mètodes pels quals les noves tasques es poden transferir als robots. Aquesta tesi està motivada per la pregunta genèrica de "què imitar?", Que es refereix al problema de com extreure les característiques essencials d'una tasca. Amb aquesta finalitat, aquí adoptem la perspectiva del Reconeixement d'Accions (AR) per tal de permetre que el robot decideixi el què cal imitar o inferir en interactuar amb un ésser humà. L'enfoc proposat es basa en un mètode ben conegut que prové del processament del llenguatge natural: és a dir, la bossa de paraules (BoW). Aquest mètode s'aplica a grans bases de dades per tal d'obtenir un model entrenat. Encara que BoW és una tècnica d'aprenentatge de màquines que s'utilitza en diversos camps de la investigació, en la classificació d'accions per a l'aprenentatge en robots està lluny de ser acurada. D'altra banda, se centra en la classificació d'objectes i gestos en lloc d'accions. Per tant, en aquesta tesi es demostra que el mètode és adequat, en escenaris de classificació d'accions, per a la fusió d'informació de diferents fonts o de diferents assajos. Aquesta tesi fa tres contribucions: (1) es proposa un mètode general per fer front al reconeixement d'accions i per tant contribuir a l'aprenentatge per imitació; (2) la metodologia pot aplicar-se a grans bases de dades, que inclouen diferents modes de captura de les accions; i (3) el mètode s'aplica específicament en un projecte internacional d'innovació real anomenat Vinbot.El Aprendizaje por Imitación (IL), o Programación de robots por Demostración (PbD), abarca métodos por los cuales un robot aprende nuevas habilidades a través de la orientación humana y la imitación. La PbD se inspira en la forma en que los seres humanos aprenden nuevas habilidades por imitación con el fin de desarrollar métodos por los cuales las nuevas tareas se pueden transferir a los robots. Esta tesis está motivada por la pregunta genérica de "qué imitar?", que se refiere al problema de cómo extraer las características esenciales de una tarea. Con este fin, aquí adoptamos la perspectiva del Reconocimiento de Acciones (AR) con el fin de permitir que el robot decida lo que hay que imitar o inferir al interactuar con un ser humano. El enfoque propuesto se basa en un método bien conocido que proviene del procesamiento del lenguaje natural: es decir, la bolsa de palabras (BoW). Este método se aplica a grandes bases de datos con el fin de obtener un modelo entrenado. Aunque BoW es una técnica de aprendizaje de máquinas que se utiliza en diversos campos de la investigación, en la clasificación de acciones para el aprendizaje en robots está lejos de ser acurada. Además, se centra en la clasificación de objetos y gestos en lugar de acciones. Por lo tanto, en esta tesis se demuestra que el método es adecuado, en escenarios de clasificación de acciones, para la fusión de información de diferentes fuentes o de diferentes ensayos. Esta tesis hace tres contribuciones: (1) se propone un método general para hacer frente al reconocimiento de acciones y por lo tanto contribuir al aprendizaje por imitación; (2) la metodología puede aplicarse a grandes bases de datos, que incluyen diferentes modos de captura de las acciones; y (3) el método se aplica específicamente en un proyecto internacional de innovación real llamado Vinbot.Imitation Learning (IL), or robot Programming by Demonstration (PbD), covers methods by which a robot learns new skills through human guidance and imitation. PbD takes its inspiration from the way humans learn new skills by imitation in order to develop methods by which new tasks can be transmitted to robots. This thesis is motivated by the generic question of “what to imitate?” which concerns the problem of how to extract the essential features of a task. To this end, here we adopt Action Recognition (AR) perspective in order to allow the robot to decide what has to be imitated or inferred when interacting with a human kind. The proposed approach is based on a well-known method from natural language processing: namely, Bag of Words (BoW). This method is applied to large databases in order to obtain a trained model. Although BoW is a machine learning technique that is used in various fields of research, in action classification for robot learning it is far from accurate. Moreover, it focuses on the classification of objects and gestures rather than actions. Thus, in this thesis we show that the method is suitable in action classification scenarios for merging information from different sources or different trials. This thesis makes three contributions: (1) it proposes a general method for dealing with action recognition and thus to contribute to imitation learning; (2) the methodology can be applied to large databases which include different modes of action captures; and (3) the method is applied specifically in a real international innovation project called Vinbot

    Deep spiking neural networks with applications to human gesture recognition

    Get PDF
    The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals. From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems. From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed. In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise.The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals. From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems. From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed. In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise

    A Socially Assistive Robot for Stroke Patients: Acceptance, Needs, and Concerns of Patients and Informal Caregivers

    Get PDF
    Stroke patients often contend with long-term physical challenges that require treatment and support from both formal and informal caregivers. Socially Assistive Robots (SARs) can assist patients in their physical rehabilitation process and relieve some of the burden on the informal caregivers, such as spouses and family members. We collected and analyzed information from 23 participants (11 stroke patients and 12 informal caregivers) who participated in a total of six focus-group discussions. The participants responded to questions regarding using a SAR to promote physical exercises during the rehabilitation process: (a) the advantages and disadvantages of doing so; (b) specific needs that they wish a SAR would address; (c) patient-specific adaptations they would propose to include; and (d) concerns they had regarding the use of such technology in stroke rehabilitation. We found that the majority of the participants in both groups were interested in experiencing the use of a SAR for rehabilitation, in the clinic and at home. Both groups noted the advantage of having the constant presence of a motivating entity with whom they can practice their rehabilitative exercises. The patients noted how such a device can assist formal caregivers in managing their workload, while the informal caregivers indicated that such a system could ease their own workload and sense of burden. The main disadvantages that participants noted related to the robot not possessing human abilities, such as the ability to hold a conversation, to physically guide the patient's movements, and to express or understand emotions. We anticipate that the data collected in this study—input from the patients and their family members, including the similarities and differences between their points of view—will aid in improving the development of SARs for rehabilitation, so that they can better suit people who have had a stroke, and meet their individual needs

    Bridging the gap between emotion and joint action

    Get PDF
    Our daily human life is filled with a myriad of joint action moments, be it children playing, adults working together (i.e., team sports), or strangers navigating through a crowd. Joint action brings individuals (and embodiment of their emotions) together, in space and in time. Yet little is known about how individual emotions propagate through embodied presence in a group, and how joint action changes individual emotion. In fact, the multi-agent component is largely missing from neuroscience-based approaches to emotion, and reversely joint action research has not found a way yet to include emotion as one of the key parameters to model socio-motor interaction. In this review, we first identify the gap and then stockpile evidence showing strong entanglement between emotion and acting together from various branches of sciences. We propose an integrative approach to bridge the gap, highlight five research avenues to do so in behavioral neuroscience and digital sciences, and address some of the key challenges in the area faced by modern societies

    Design and Development of the eBear: A Socially Assistive Robot for Elderly People with Depression

    Get PDF
    There has been tremendous progress in the field of robotics in the past decade and especially developing humanoid robots with social abilities that can assist human at a socio-emotional level. The objective of this thesis is to develop and study a perceptive and expressive animal-like robot equipped with artificial intelligence in assisting the elderly people with depression. We investigated how social robots can become companions of elderly individuals with depression and improve their mood and increase their happiness and well-being. The robotic platform built in this thesis is a bear-like robot called the eBear. The eBear can show facial expression and head gesture, can understand user\u27s emotion using audio-video sensory inputs and machine learning, can speak and show relatively accurate visual speech, and make dialog with users. the eBear can respond to their questions by querying the Internet, and even encourage them to physically be more active and even perform simple physical exercises. Besides building the robot, the eBear was used in running a pilot study in which seven elderly people with mild to severe depression interacted with the eBear for about 45 minutes three times a week over one month. The results of the study show that interacting with the eBear can increase happiness and mood of these human users as measured by Face Scale, and Geriatric Depression Scale (GDS) score systems. In addition, using Almere Model, it was concluded that the acceptance of the social agent increased over the study period. Videos of the users interaction with the eBear was analyzed and eye gaze, and facial expressions were manually annotated to better understand the behavior changes of users with the eBear. Results of these analyses as well as the exit surveys completed by the users at the end of the study demonstrate that a social robot such as the eBear can be an effective companion for the elderly people and can be a new approach for depression treatment

    Conception d’un mécanisme intégré d’attention sélective dans une architecture comportementale pour robots autonomes

    Get PDF
    Le vieillissement de la population à travers le monde nous amène à considérer sérieusement l'intégration dans notre quotidien de robots de service afin d'alléger les besoins pour la prestation de soins. Or, il n'existe pas présentement de robots de service suffisamment avancés pour être utiles en tant que véritables assistants à des personnes en perte d'autonomie. Un des problèmes freinant le développement de tels robots en est un d'intégration logicielle. En effet, il est difficile d'intégrer les multiples capacités de perception et d'action nécessaires à interagir de manière naturelle et adéquate avec une personne en milieu réel, les limites des ressources de calculs disponibles sur une plateforme robotique étant rapidement atteintes. Même si le cerveau humain a des capacités supérieures à un ordinateur, lui aussi a des limites sur ses capacités de traitement de l'information. Pour faire face à ces limites, l'humain gère ses capacités cognitives avec l'aide de l'attention sélective. L'attention sélective lui permet par exemple d'ignorer certains stimuli pour concentrer ses ressources sur ceux utiles à sa tâche. Puisque les robots pourraient grandement bénéficier d'un tel mécanisme, l'objectif de la thèse est de développer une architecture de contrôle intégrant un mécanisme d'attention sélective afin de diminuer la charge de calcul demandée par les différents modules de traitement du robot. L'architecture de contrôle utilisé est basée sur l'approche comportementale, et porte le nom HBBA, pour Hybrid Behavior-Based Architecture. Pour répondre à cet objectif, le robot humanoïde nommé IRL-1 a été conçu pour permettre l'intégration de multiples capacités de perception et d'action sur une seule et même plateforme, afin de s'en servir comme plateforme expérimentale pouvant bénéficier de mécanismes d'attention sélective. Les capacités implémentées permettent d'interagir avec IRL-1 selon différentes modalités. IRL-1 peut être guidé physiquement en percevant les forces externes par le bias d'actionneurs élastiques utilisés dans la direction de sa plateforme omnidirectionnelle. La vision, le mouvement et l'audition ont été intégrés dans une interface de téléprésence augmentée. De plus, l'influence des délais de réaction à des sons dans l'environnement a pu être examinée. Cette implémentation a permis de valider l'usage de HBBA comme base de travail pour la prise de décision du robot, ainsi que d'explorer les limites en termes de capacités de traitement des modules sur le robot. Ensuite, un mécanisme d'attention sélective a été développé au sein de HBBA. Le mécanisme en question intègre l'activation de modules de traitement avec le filtrage perceptuel, soit la capacité de moduler la quantité de stimuli utilisés par les modules de traitement afin d'adapter le traitement aux ressources de calculs disponibles. Les résultats obtenus démontrent les bénéfices qu'apportent un tel mécanisme afin de permettre au robot d'optimiser l'usage de ses ressources de calculs afin de satisfaire ses buts. De ces travaux résulte une base sur laquelle il est maintenant possible de poursuivre l'intégration de capacités encore plus avancées et ainsi progresser efficacement vers la conception de robots domestiques pouvant nous assister dans notre quotidien

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
    corecore