32 research outputs found

    Multimodal human hand motion sensing and analysis - a review

    Get PDF

    Design and Development of a Human Gesture Recognition System in Tridimensional Interactive Virtual Environment

    Get PDF
    This thesis describes the design and the development of a recognition system for human gestures. The main goal of this work is to demonstrate the possibility to extract enough information, both semantic and quantitative, from the human action, to perform complex tasks in a virtual environment. To manage the complexity and the variability adaptive systems are exploited, both in building a codebook (by unsupervised neural networks), and to recognize the sequence of symbols describing a gesture (by Hidden Markov models)

    Contrastive Video Question Answering via Video Graph Transformer

    Full text link
    We propose to perform video question answering (VideoQA) in a Contrastive manner via a Video Graph Transformer model (CoVGT). CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning. 2) It designs separate video and text transformers for contrastive learning between the video and text to perform QA, instead of multi-modal transformer for answer classification. Fine-grained video-text communication is done by additional cross-modal interaction modules. 3) It is optimized by the joint fully- and self-supervised contrastive objectives between the correct and incorrect answers, as well as the relevant and irrelevant questions respectively. With superior video encoding and QA solution, we show that CoVGT can achieve much better performances than previous arts on video reasoning tasks. Its performances even surpass those models that are pretrained with millions of external data. We further show that CoVGT can also benefit from cross-modal pretraining, yet with orders of magnitude smaller data. The results demonstrate the effectiveness and superiority of CoVGT, and additionally reveal its potential for more data-efficient pretraining. We hope our success can advance VideoQA beyond coarse recognition/description towards fine-grained relation reasoning of video contents. Our code is available at https://github.com/doc-doc/CoVGT.Comment: Accepted by IEEE T-PAMI'2

    Myoelectric forearm prostheses: State of the art from a user-centered perspective

    Get PDF
    User acceptance of myoelectric forearm prostheses is currently low. Awkward control, lack of feedback, and difficult training are cited as primary reasons. Recently, researchers have focused on exploiting the new possibilities offered by advancements in prosthetic technology. Alternatively, researchers could focus on prosthesis acceptance by developing functional requirements based on activities users are likely to perform. In this article, we describe the process of determining such requirements and then the application of these requirements to evaluating the state of the art in myoelectric forearm prosthesis research. As part of a needs assessment, a workshop was organized involving clinicians (representing end users), academics, and engineers. The resulting needs included an increased number of functions, lower reaction and execution times, and intuitiveness of both control and feedback systems. Reviewing the state of the art of research in the main prosthetic subsystems (electromyographic [EMG] sensing, control, and feedback) showed that modern research prototypes only partly fulfill the requirements. We found that focus should be on validating EMG-sensing results with patients, improving simultaneous control of wrist movements and grasps, deriving optimal parameters for force and position feedback, and taking into account the psychophysical aspects of feedback, such as intensity perception and spatial acuity

    HAND GESTURE RECOGNITION: A LITERATURE REVIEW

    Get PDF
    ABSTRAC

    Towards Open-set Gesture Recognition via Feature Activation Enhancement and Orthogonal Prototype Learning

    Full text link
    Gesture recognition is a foundational task in human-machine interaction (HMI). While there has been significant progress in gesture recognition based on surface electromyography (sEMG), accurate recognition of predefined gestures only within a closed set is still inadequate in practice. It is essential to effectively discern and reject unknown gestures of disinterest in a robust system. Numerous methods based on prototype learning (PL) have been proposed to tackle this open set recognition (OSR) problem. However, they do not fully explore the inherent distinctions between known and unknown classes. In this paper, we propose a more effective PL method leveraging two novel and inherent distinctions, feature activation level and projection inconsistency. Specifically, the Feature Activation Enhancement Mechanism (FAEM) widens the gap in feature activation values between known and unknown classes. Furthermore, we introduce Orthogonal Prototype Learning (OPL) to construct multiple perspectives. OPL acts to project a sample from orthogonal directions to maximize the distinction between its two projections, where unknown samples will be projected near the clusters of different known classes while known samples still maintain intra-class similarity. Our proposed method simultaneously achieves accurate closed-set classification for predefined gestures and effective rejection for unknown gestures. Extensive experiments demonstrate its efficacy and superiority in open-set gesture recognition based on sEMG

    多种手势对应同一语义的柔性映射交互算法的研究

    Get PDF
    针对智能交互界面中手势识别错误导致交互界面变化错误和手势不识别两个基本问题,本文设计并实现了基于手势交互的智能教学界面,该系统可以通过获取教师的手势信息与教师进行交互.主要创新点在于提出了多种手势对应同一语义的柔性映射交互算法.本文选取了14种自然交互手势,分析了对应同一语义的多种手势之间的共同特征.实验结果显示,该算法能够有效降低用户负荷.该算法已经用于一个基于手势交互的智能教学系统界面中.国家重点研发计划(No.2018YFB1004901)国家自然科学基金(No.61472163,No.61603151)山东省重点研发计划(No.2017GGX10146

    Hand Gesture Recognition for Sign Language Transcription

    Get PDF
    Sign Language is a language which allows mute people to communicate with other mute or non-mute people. The benefits provided by this language, however, disappear when one of the members of a group does not know Sign Language and a conversation starts using that language. In this document, I present a system that takes advantage of Convolutional Neural Networks to recognize hand letter and number gestures from American Sign Language based on depth images captured by the Kinect camera. In addition, as a byproduct of these research efforts, I collected a new dataset of depth images of American Sign Language letters and numbers, and I compared the presented method for image recognition against a similar dataset but for Vietnamese Sign Language. Finally, I present how this work supports my ideas for the future work on a complete system for Sign Language transcription
    corecore