32 research outputs found
Design and Development of a Human Gesture Recognition System in Tridimensional Interactive Virtual Environment
This thesis describes the design and the development of a recognition
system for human gestures. The main goal of this work is to demonstrate
the possibility to extract enough information, both semantic and quantitative,
from the human action, to perform complex tasks in a virtual environment.
To manage the complexity and the variability adaptive systems are
exploited, both in building a codebook (by unsupervised neural networks),
and to recognize the sequence of symbols describing a gesture (by Hidden
Markov models)
Contrastive Video Question Answering via Video Graph Transformer
We propose to perform video question answering (VideoQA) in a Contrastive
manner via a Video Graph Transformer model (CoVGT). CoVGT's uniqueness and
superiority are three-fold: 1) It proposes a dynamic graph transformer module
which encodes video by explicitly capturing the visual objects, their relations
and dynamics, for complex spatio-temporal reasoning. 2) It designs separate
video and text transformers for contrastive learning between the video and text
to perform QA, instead of multi-modal transformer for answer classification.
Fine-grained video-text communication is done by additional cross-modal
interaction modules. 3) It is optimized by the joint fully- and self-supervised
contrastive objectives between the correct and incorrect answers, as well as
the relevant and irrelevant questions respectively. With superior video
encoding and QA solution, we show that CoVGT can achieve much better
performances than previous arts on video reasoning tasks. Its performances even
surpass those models that are pretrained with millions of external data. We
further show that CoVGT can also benefit from cross-modal pretraining, yet with
orders of magnitude smaller data. The results demonstrate the effectiveness and
superiority of CoVGT, and additionally reveal its potential for more
data-efficient pretraining. We hope our success can advance VideoQA beyond
coarse recognition/description towards fine-grained relation reasoning of video
contents. Our code is available at https://github.com/doc-doc/CoVGT.Comment: Accepted by IEEE T-PAMI'2
Myoelectric forearm prostheses: State of the art from a user-centered perspective
User acceptance of myoelectric forearm prostheses is currently low. Awkward control, lack of feedback, and difficult training are cited as primary reasons. Recently, researchers have focused on exploiting the new possibilities offered by advancements in prosthetic technology. Alternatively, researchers could focus on prosthesis acceptance by developing functional requirements based on activities users are likely to perform. In this article, we describe the process of determining such requirements and then the application of these requirements to evaluating the state of the art in myoelectric forearm prosthesis research. As part of a needs assessment, a workshop was organized involving clinicians (representing end users), academics, and engineers. The resulting needs included an increased number of functions, lower reaction and execution times, and intuitiveness of both control and feedback systems. Reviewing the state of the art of research in the main prosthetic subsystems (electromyographic [EMG] sensing, control, and feedback) showed that modern research prototypes only partly fulfill the requirements. We found that focus should be on validating EMG-sensing results with patients, improving simultaneous control of wrist movements and grasps, deriving optimal parameters for force and position feedback, and taking into account the psychophysical aspects of feedback, such as intensity perception and spatial acuity
Towards Open-set Gesture Recognition via Feature Activation Enhancement and Orthogonal Prototype Learning
Gesture recognition is a foundational task in human-machine interaction
(HMI). While there has been significant progress in gesture recognition based
on surface electromyography (sEMG), accurate recognition of predefined gestures
only within a closed set is still inadequate in practice. It is essential to
effectively discern and reject unknown gestures of disinterest in a robust
system. Numerous methods based on prototype learning (PL) have been proposed to
tackle this open set recognition (OSR) problem. However, they do not fully
explore the inherent distinctions between known and unknown classes. In this
paper, we propose a more effective PL method leveraging two novel and inherent
distinctions, feature activation level and projection inconsistency.
Specifically, the Feature Activation Enhancement Mechanism (FAEM) widens the
gap in feature activation values between known and unknown classes.
Furthermore, we introduce Orthogonal Prototype Learning (OPL) to construct
multiple perspectives. OPL acts to project a sample from orthogonal directions
to maximize the distinction between its two projections, where unknown samples
will be projected near the clusters of different known classes while known
samples still maintain intra-class similarity. Our proposed method
simultaneously achieves accurate closed-set classification for predefined
gestures and effective rejection for unknown gestures. Extensive experiments
demonstrate its efficacy and superiority in open-set gesture recognition based
on sEMG
多种手势对应同一语义的柔性映射交互算法的研究
针对智能交互界面中手势识别错误导致交互界面变化错误和手势不识别两个基本问题,本文设计并实现了基于手势交互的智能教学界面,该系统可以通过获取教师的手势信息与教师进行交互.主要创新点在于提出了多种手势对应同一语义的柔性映射交互算法.本文选取了14种自然交互手势,分析了对应同一语义的多种手势之间的共同特征.实验结果显示,该算法能够有效降低用户负荷.该算法已经用于一个基于手势交互的智能教学系统界面中.国家重点研发计划(No.2018YFB1004901)国家自然科学基金(No.61472163,No.61603151)山东省重点研发计划(No.2017GGX10146
Hand Gesture Recognition for Sign Language Transcription
Sign Language is a language which allows mute people to communicate with other mute or non-mute people. The benefits provided by this language, however, disappear when one of the members of a group does not know Sign Language and a conversation starts using that language. In this document, I present a system that takes advantage of Convolutional Neural Networks to recognize hand letter and number gestures from American Sign Language based on depth images captured by the Kinect camera. In addition, as a byproduct of these research efforts, I collected a new dataset of depth images of American Sign Language letters and numbers, and I compared the presented method for image recognition against a similar dataset but for Vietnamese Sign Language. Finally, I present how this work supports my ideas for the future work on a complete system for Sign Language transcription