159 research outputs found

    Recognizing Gestures by Learning Local Motion Signatures of HOG Descriptors

    Get PDF
    International audienceWe introduce a new gesture recognition framework based on learning local motion signatures (LMSs) of HOG descriptors . Our main contribution is to propose a new probabilistic learning-classification scheme based on a reliable tracking of local features. After the generation of these LMSs computed on one individual by tracking Histograms of Oriented Gradient (HOG) descriptor, we learn a code-book of video-words (i.e. clusters of LMSs) using kmeans algorithm on a learning gesture video database. Then the video-words are compacted to a code-book of code-words by the Maximization of Mutual Information (MMI) algorithm. At the final step, we compare the LMSs generated for a new gesture w.r.t. the learned code-book via the k-nearest neighbors (k-NN) algorithm and a novel voting strategy. Our main contribution is the handling of the N to N mapping between code-words and gesture labels within the proposed voting strategy. Experiments have been carried out on two public gesture databases: KTH and IXMAS . Results show that the proposed method outperforms recent state-of-the-art method

    Recognizing Gestures by Learning Local Motion Signatures of HOG Descriptors

    Full text link

    Gesture Recognition by Learning Local Motion Signatures

    Get PDF
    International audienceThis paper overviews a new gesture recognition framework based on learning local motion signatures (LMSs) introduced by [5]. After the generation of these LMSs computed on one individual by tracking Histograms of Oriented Gradient (HOG) [2] descriptor, we learn a codebook of video-words (i.e. clusters of LMSs) using k-means algorithm on a learning gesture video database. Then the videowords are compacted to a codebook of code-words by the Maximization of Mutual Information (MMI) algorithm. At the final step, we compare the LMSs generated for a new gesture w.r.t. the learned codebook via the k-nearest neighbors (k-NN) algorithm and a novel voting strategy. Our main contribution is the handling of the N to N mapping between code-words and gesture labels with the proposed voting strategy. Experiments have been carried out on two public gesture databases: KTH [16] and IXMAS [19]. Results show that the proposed method outperforms recent state-of-the-art methods

    Tracking HOG Descriptors for Gesture Recognition

    Get PDF
    International audienceWe introduce a new HoG (Histogram of Oriented Gradients) tracker for Gesture Recognition. Our main contribution is to build HoG trajectory descriptors (representing local motion) which are used for gesture recognition. First, we select for each individual in the scene a set of corner points to determine textured regions where to compute 2D HoG descriptors. Second, we track these 2D HoG descriptors in order to build temporal HoG descriptors. Lost descriptors are replaced by newly detected ones. Finally, we extract the local motion descriptors to learn offline a set of given gestures. Then, a new video can be classified according to the gesture occurring in the video. Results shows that the tracker performs well compared to KLT tracker [1]. The generated local motion descriptors are validated through gesture learning-classification using the KTH action database [2]

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Efficient Human Activity Recognition in Large Image and Video Databases

    Get PDF
    Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images

    Recognition of Activities of Daily Living with Egocentric Vision: A Review.

    Get PDF
    Video-based recognition of activities of daily living (ADLs) is being used in ambient assisted living systems in order to support the independent living of older people. However, current systems based on cameras located in the environment present a number of problems, such as occlusions and a limited field of view. Recently, wearable cameras have begun to be exploited. This paper presents a review of the state of the art of egocentric vision systems for the recognition of ADLs following a hierarchical structure: motion, action and activity levels, where each level provides higher semantic information and involves a longer time frame. The current egocentric vision literature suggests that ADLs recognition is mainly driven by the objects present in the scene, especially those associated with specific tasks. However, although object-based approaches have proven popular, object recognition remains a challenge due to the intra-class variations found in unconstrained scenarios. As a consequence, the performance of current systems is far from satisfactory

    Detection of major ASL sign types in continuous signing for ASL recognition

    Get PDF
    In American Sign Language (ASL) as well as other signed languages, different classes of signs (e.g., lexical signs, fingerspelled signs, and classifier constructions) have different internal structural properties. Continuous sign recognition accuracy can be improved through use of distinct recognition strategies, as well as different training datasets, for each class of signs. For these strategies to be applied, continuous signing video needs to be segmented into parts corresponding to particular classes of signs. In this paper we present a multiple instance learning-based segmentation system that accurately labels 91.27% of the video frames of 500 continuous utterances (including 7 different subjects) from the publicly accessible NCSLGR corpus (Neidle and Vogler, 2012). The system uses novel feature descriptors derived from both motion and shape statistics of the regions of high local motion. The system does not require a hand tracker
    corecore