9,088 research outputs found

    Skeleton-based human action and gesture recognition for human-robot collaboration

    Get PDF
    openThe continuous development of robotic and sensing technologies has led in recent years to an increased interest in human-robot collaborative systems, in which humans and robots perform tasks in shared spaces and interact with close and direct contacts. In these scenarios, it is fundamental for the robot to be aware of the behaviour that a person in its proximity has, to ensure their safety and anticipate their actions in performing a shared and collaborative task. To this end, human activity recognition (HAR) techniques have been often applied in human-robot collaboration (HRC) settings. The works in this field usually focus on case-specific applications. Instead, in this thesis we propose a general framework for human action and gesture recognition in a HRC scenario. In particular, a transfer learning enabled skeleton-based approach that employs as backbone the Shift-GCN architecture is used to classify general actions related to HRC scenarios. Pose-based body and hands features are exploited to recognise actions in a way that is independent from the environment in which these are performed and from the tools and objects involved in their execution. The fusion of small network modules, each dedicated to the recognition of either the body or hands movements, is then explored. This allows to better understand the importance of different body parts in the recognition of the actions as well as to improve the classification outcomes. For our experiments, we used the large-scale NTU RGB+D dataset to pre-train the networks. Moreover, a new HAR dataset, named IAS-Lab Collaborative HAR dataset, was collected, containing general actions and gestures related to HRC contexts. On this dataset, our approach reaches a 76.54% accuracy

    Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding

    Get PDF
    Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness

    Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

    Full text link
    We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real human interaction with simulated robots via mouse/keyboard or a VR interface, facilitating evaluation of robot policies with human input. (3) Collaborative tasks: studying two collaborative tasks, Social Navigation and Social Rearrangement. Social Navigation investigates a robot's ability to locate and follow humanoid avatars in unseen environments, whereas Social Rearrangement addresses collaboration between a humanoid and robot while rearranging a scene. These contributions allow us to study end-to-end learned and heuristic baselines for human-robot collaboration in-depth, as well as evaluate them with humans in the loop. Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before. Additionally, we observe emergent behaviors during collaborative task execution, such as the robot yielding space when obstructing a humanoid agent, thereby allowing the effective completion of the task by the humanoid agent. Furthermore, our experiments using the human-in-the-loop tool demonstrate that our automated evaluation with humanoids can provide an indication of the relative ordering of different policies when evaluated with real human collaborators. Habitat 3.0 unlocks interesting new features in simulators for Embodied AI, and we hope it paves the way for a new frontier of embodied human-AI interaction capabilities.Comment: Project page: http://aihabitat.org/habitat

    Vision- and tactile-based continuous multimodal intention and attention recognition for safer physical human-robot interaction

    Full text link
    Employing skin-like tactile sensors on robots enhances both the safety and usability of collaborative robots by adding the capability to detect human contact. Unfortunately, simple binary tactile sensors alone cannot determine the context of the human contact -- whether it is a deliberate interaction or an unintended collision that requires safety manoeuvres. Many published methods classify discrete interactions using more advanced tactile sensors or by analysing joint torques. Instead, we propose to augment the intention recognition capabilities of simple binary tactile sensors by adding a robot-mounted camera for human posture analysis. Different interaction characteristics, including touch location, human pose, and gaze direction, are used to train a supervised machine learning algorithm to classify whether a touch is intentional or not with an F1-score of 86%. We demonstrate that multimodal intention recognition is significantly more accurate than monomodal analyses with the collaborative robot Baxter. Furthermore, our method can also continuously monitor interactions that fluidly change between intentional or unintentional by gauging the user's attention through gaze. If a user stops paying attention mid-task, the proposed intention and attention recognition algorithm can activate safety features to prevent unsafe interactions. We also employ a feature reduction technique that reduces the number of inputs to five to achieve a more generalized low-dimensional classifier. This simplification both reduces the amount of training data required and improves real-world classification accuracy. It also renders the method potentially agnostic to the robot and touch sensor architectures while achieving a high degree of task adaptability.Comment: 11 pages, 8 figures, preprint under revie

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Collaborative robot control with hand gestures

    Get PDF
    Mestrado de dupla diplomação com a Université Libre de TunisThis thesis focuses on hand gesture recognition by proposing an architecture to control a collaborative robot in real-time vision based on hand detection, tracking, and gesture recognition for interaction with an application via hand gestures. The first stage of our system allows detecting and tracking a bar e hand in a cluttered background using skin detection and contour comparison. The second stage allows recognizing hand gestures using a Machine learning method algorithm. Finally an interface has been developed to control the robot over. Our hand gesture recognition system consists of two parts, in the first part for every frame captured from a camera we extract the keypoints for every training image using a machine learning algorithm, and we appoint the keypoints from every image into a keypoint map. This map is treated as an input for our processing algorithm which uses several methods to recognize the fingers in each hand. In the second part, we use a 3D camera with Infrared capabilities to get a 3D model of the hand to implement it in our system, after that we track the fingers in each hand and recognize them which made it possible to count the extended fingers and to distinguish each finger pattern. An interface to control the robot has been made that utilizes the previous steps that gives a real-time process and a dynamic 3D representation.Esta dissertação trata do reconhecimento de gestos realizados com a mão humana, propondo uma arquitetura para interagir com um robô colaborativo, baseado em visão computacional, rastreamento e reconhecimento de gestos. O primeiro estágio do sistema desenvolvido permite detectar e rastrear a presença de uma mão em um fundo desordenado usando detecção de pele e comparação de contornos. A segunda fase permite reconhecer os gestos das mãos usando um algoritmo do método de aprendizado de máquina. Finalmente, uma interface foi desenvolvida para interagir com robô. O sistema de reconhecimento de gestos manuais está dividido em duas partes. Na primeira parte, para cada quadro capturado de uma câmera, foi extraído os pontos-chave de cada imagem de treinamento usando um algoritmo de aprendizado de máquina e nomeamos os pontos-chave de cada imagem em um mapa de pontos-chave. Este mapa é tratado como uma entrada para o algoritmo de processamento que usa vários métodos para reconhecer os dedos em cada mão. Na segunda parte, foi utilizado uma câmera 3D com recursos de infravermelho para obter um modelo 3D da mão para implementá-lo em no sistema desenvolvido, e então, foi realizado os rastreio dos dedos de cada mão seguido pelo reconhecimento que possibilitou contabilizar os dedos estendidos e para distinguir cada padrão de dedo. Foi elaborado uma interface para interagir com o robô manipulador que utiliza as etapas anteriores que fornece um processo em tempo real e uma representação 3D dinâmica

    To Draw or Not to Draw: Recognizing Stroke-Hover Intent in Gesture-Free Bare-Hand Mid-Air Drawing Tasks

    Get PDF
    Over the past several decades, technological advancements have introduced new modes of communication with the computers, introducing a shift from traditional mouse and keyboard interfaces. While touch based interactions are abundantly being used today, latest developments in computer vision, body tracking stereo cameras, and augmented and virtual reality have now enabled communicating with the computers using spatial input in the physical 3D space. These techniques are now being integrated into several design critical tasks like sketching, modeling, etc. through sophisticated methodologies and use of specialized instrumented devices. One of the prime challenges in design research is to make this spatial interaction with the computer as intuitive as possible for the users. Drawing curves in mid-air with fingers, is a fundamental task with applications to 3D sketching, geometric modeling, handwriting recognition, and authentication. Sketching in general, is a crucial mode for effective idea communication between designers. Mid-air curve input is typically accomplished through instrumented controllers, specific hand postures, or pre-defined hand gestures, in presence of depth and motion sensing cameras. The user may use any of these modalities to express the intention to start or stop sketching. However, apart from suffering with issues like lack of robustness, the use of such gestures, specific postures, or the necessity of instrumented controllers for design specific tasks further result in an additional cognitive load on the user. To address the problems associated with different mid-air curve input modalities, the presented research discusses the design, development, and evaluation of data driven models for intent recognition in non-instrumented, gesture-free, bare-hand mid-air drawing tasks. The research is motivated by a behavioral study that demonstrates the need for such an approach due to the lack of robustness and intuitiveness while using hand postures and instrumented devices. The main objective is to study how users move during mid-air sketching, develop qualitative insights regarding such movements, and consequently implement a computational approach to determine when the user intends to draw in mid-air without the use of an explicit mechanism (such as an instrumented controller or a specified hand-posture). By recording the user’s hand trajectory, the idea is to simply classify this point as either hover or stroke. The resulting model allows for the classification of points on the user’s spatial trajectory. Drawing inspiration from the way users sketch in mid-air, this research first specifies the necessity for an alternate approach for processing bare hand mid-air curves in a continuous fashion. Further, this research presents a novel drawing intent recognition work flow for every recorded drawing point, using three different approaches. We begin with recording mid-air drawing data and developing a classification model based on the extracted geometric properties of the recorded data. The main goal behind developing this model is to identify drawing intent from critical geometric and temporal features. In the second approach, we explore the variations in prediction quality of the model by improving the dimensionality of data used as mid-air curve input. Finally, in the third approach, we seek to understand the drawing intention from mid-air curves using sophisticated dimensionality reduction neural networks such as autoencoders. Finally, the broad level implications of this research are discussed, with potential development areas in the design and research of mid-air interactions
    • …
    corecore