2,097 research outputs found

    Learning Temporal Alignment Uncertainty for Efficient Event Detection

    Full text link
    In this paper we tackle the problem of efficient video event detection. We argue that linear detection functions should be preferred in this regard due to their scalability and efficiency during estimation and evaluation. A popular approach in this regard is to represent a sequence using a bag of words (BOW) representation due to its: (i) fixed dimensionality irrespective of the sequence length, and (ii) its ability to compactly model the statistics in the sequence. A drawback to the BOW representation, however, is the intrinsic destruction of the temporal ordering information. In this paper we propose a new representation that leverages the uncertainty in relative temporal alignments between pairs of sequences while not destroying temporal ordering. Our representation, like BOW, is of a fixed dimensionality making it easily integrated with a linear detection function. Extensive experiments on CK+, 6DMG, and UvA-NEMO databases show significant performance improvements across both isolated and continuous event detection tasks.Comment: Appeared in DICTA 2015, 8 page

    Freeform User Interfaces for Graphical Computing

    Get PDF
    報告番号: 甲15222 ; 学位授与年月日: 2000-03-29 ; 学位の種別: 課程博士 ; 学位の種類: 博士(工学) ; 学位記番号: 博工第4717号 ; 研究科・専攻: 工学系研究科情報工学専

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Unobtrusive hand gesture recognition using ultra-wide band radar and deep learning

    Get PDF
    Hand function after stroke injuries is not regained rapidly and requires physical rehabilitation for at least 6 months. Due to the heavy burden on the healthcare system, assisted rehabilitation is prescribed for a limited time, whereas so-called home rehabilitation is offered. It is therefore essential to develop robust solutions that facilitate monitoring while preserving the privacy of patients in a home-based setting. To meet these expectations, an unobtrusive solution based on radar sensing and deep learning is proposed. The multi-input multi-output convolutional eXtra trees (MIMO-CxT) is a new deep hybrid model used for hand gesture recognition (HGR) with impulse-radio ultra-wide band (IR-UWB) radars. It consists of a lightweight architecture based on a multi-input convolutional neural network (CNN) used in a hybrid configuration with extremely randomized trees (ETs). The model takes data from multiple sensors as input and processes them separately. The outputs of the CNN branches are concatenated before the prediction is made by the ETs. Moreover, the model uses depthwise separable convolution layers, which reduce computational cost and learning time while maintaining high performance. The model is evaluated on a publicly available dataset of gestures collected by three IR-UWB radars and achieved an average accuracy of 98.86%

    A Robot Calligraphy System: From Simple to Complex Writing by Human Gestures

    Get PDF
    Robotic writing is a very challenging task and involves complicated kinematic control algorithms and image processing work. This paper, alternatively, proposes a robot calligraphy system that firstly applies human arm gestures to establish a font database of Chinese character elementary strokes and English letters, then uses the created database and human gestures to write Chinese characters and English words. A three-dimensional motion sensing input device is deployed to capture the human arm trajectories, which are used to build the font database and to train a classifier ensemble. 26 types of human gesture are used for writing English letters, and 5 types of gesture are used to generate 5 elementary strokes for writing Chinese characters. By using the font database, the robot calligraphy system acquires a basic writing ability to write simple strokes and letters. Then, the robot can develop to write complex Chinese characters and English words by following human body movements. The classifier ensemble, which is used to identify each gesture, is implemented through using feature selection techniques and the harmony search algorithm, thereby achieving better classification performance. The experimental evaluations are carried out to demonstrate the feasibility and performance of the proposed method. By following the motion trajectories of the human right arm, the end-effector of the robot can successfully write the English words or Chinese characters that correspond to the arm trajectories

    A Sketch-Based Interface for Annotation of 3D Brain Vascular Reconstructions

    Get PDF
    Within the medical imaging community, 3D models of anatomical structures are now widely used in order to establish more accurate diagnoses than those based on 2D images. Many research works focus on an automatic process to build such 3D models. However automatic reconstruction induces many artifacts if the anatomical structure exhibits tortuous and thin parts (such as vascular networks) and the correction of these artifacts involves 3D-modeling skills and times that radiologists do not have. This article presents a semi-automatic approach to build a correct topology of vascular networks from 3D medical images. The user interface is based on sketching; user strokes both defines a command and the part of geometry where the command is applied to. Moreover the user-gesture speed is taken into account to adjust the command: a slow and precise gesture will correct a local part of the topology while a fast gesture will correct a larger part of the topology. Our system relies on an automatic segmentation that provides a initial guess that the user can interactively modify using the proposed set of commands. This allows to correct the anatomical aberrations or ambiguities that appear on the segmented model in a few strokes.Dans le domaine de l'imagerie médicale, la modélisation 3D de structures anatomiques est maintenant largement utilisée dans l'optique d'é}tablir des diagnostics plus précis qu'avec des données basées sur des images 2D. Aujourd'hui, de nombreux travaux mettent l'accent sur les méthodes automatique de reconstruction de modèles 3D mais ces méthodes induisent de nombreuses erreurs. Avec une structure anatomique (réseau cérébral) présente des parties assez fines et tortueuses, des erreurs sont introduites, cela nécessitent de la correction du modèle 3D, mais aussi des compétences et des heures que les radiologistes ne possèdent pas. Cet article présente une approche semi-automatique de reconstruction d'une topologie correcte de réseaux vasculaires issus d'images médicales en 3D. Notre système repose sur une segmentation automatique qui fournit une estimation initiale dont l'utilisateur peut modifier interactivement en utilisant un jeu proposé de commandes basées sur le croquis. Cela permet de corriger les aberrations anatomiques ou les ambiguïtés qui apparaissent sur le modèle segmenté en quelques traits

    Learning to Recognize Touch Gestures: Recurrent vs. Convolutional Features and Dynamic Sampling

    Get PDF
    International audienceWe propose a fully automatic method for learning gestures on big touch devices in a potentially multiuser context. The goal is to learn general models capable of adapting to different gestures, user styles and hardware variations (e.g. device sizes, sampling frequencies and regularities). Based on deep neural networks, our method features a novel dynamic sampling and temporal normalization component, transforming variable length gestures into fixed length representations while preserving finger/surface contact transitions, that is, the topology of the signal. This sequential representation is then processed with a convolutional model capable, unlike recurrent networks, of learning hierarchical representations with different levels of abstraction. To demonstrate the interest of the proposed method, we introduce a new touch gestures dataset with 6591 gestures performed by 27 people, which is, up to our knowledge, the first of its kind: a publicly available multi-touch gesture dataset for interaction. We also tested our method on a standard dataset of symbolic touch gesture recognition, the MMG dataset, outperforming the state of the art and reporting close to perfect performance

    Learning to recognize touch gestures: recurrent vs. convolutional features and dynamic sampling

    Get PDF
    We propose a fully automatic method for learning gestures on big touch devices in a potentially multi-user context. The goal is to learn general models capable of adapting to different gestures, user styles and hardware variations (e.g. device sizes, sampling frequencies and regularities). Based on deep neural networks, our method features a novel dynamic sampling and temporal normalization component, transforming variable length gestures into fixed length representations while preserving finger/surface contact transitions, that is, the topology of the signal. This sequential representation is then processed with a convolutional model capable, unlike recurrent networks, of learning hierarchical representations with different levels of abstraction. To demonstrate the interest of the proposed method, we introduce a new touch gestures dataset with 6591 gestures performed by 27 people, which is, up to our knowledge, the first of its kind: a publicly available multi-touch gesture dataset for interaction. We also tested our method on a standard dataset of symbolic touch gesture recognition, the MMG dataset, outperforming the state of the art and reporting close to perfect performance.Comment: 9 pages, 4 figures, accepted at the 13th IEEE Conference on Automatic Face and Gesture Recognition (FG2018). Dataset available at http://itekube7.itekube.co

    Evaluation of different chrominance models in the detection and reconstruction of faces and hands using the growing neural gas network

    Get PDF
    Physical traits such as the shape of the hand and face can be used for human recognition and identification in video surveillance systems and in biometric authentication smart card systems, as well as in personal health care. However, the accuracy of such systems suffers from illumination changes, unpredictability, and variability in appearance (e.g. occluded faces or hands, cluttered backgrounds, etc.). This work evaluates different statistical and chrominance models in different environments with increasingly cluttered backgrounds where changes in lighting are common and with no occlusions applied, in order to get a reliable neural network reconstruction of faces and hands, without taking into account the structural and temporal kinematics of the hands. First a statistical model is used for skin colour segmentation to roughly locate hands and faces. Then a neural network is used to reconstruct in 3D the hands and faces. For the filtering and the reconstruction we have used the growing neural gas algorithm which can preserve the topology of an object without restarting the learning process. Experiments conducted on our own database but also on four benchmark databases (Stirling’s, Alicante, Essex, and Stegmann’s) and on deaf individuals from normal 2D videos are freely available on the BSL signbank dataset. Results demonstrate the validity of our system to solve problems of face and hand segmentation and reconstruction under different environmental conditions
    corecore