1,770 research outputs found

    In-Place Gestures Classification via Long-term Memory Augmented Network

    Full text link
    In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input, which actually affects the classification performance due to limited spatio-temporal information. In this paper, we propose a novel long-term memory augmented network for in-place gestures classification. It takes as input both short-term gesture sequence samples and their corresponding long-term sequence samples that provide extra relevant spatio-temporal information in the training phase. We store long-term sequence features with an external memory queue. In addition, we design a memory augmented loss to help cluster features of the same class and push apart features from different classes, thus enabling our memory queue to memorize more relevant long-term sequence features. In the inference phase, we input only short-term sequence samples to recall the stored features accordingly, and fuse them together to predict the gesture class. We create a large-scale in-place gestures dataset from 25 participants with 11 gestures. Our method achieves a promising accuracy of 95.1% with a latency of 192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to be superior to recent in-place gesture classification techniques. User study also validates our approach. Our source code and dataset will be made available to the community.Comment: This paper is accepted to IEEE ISMAR202

    Learning Human Motion Models for Long-term Predictions

    Full text link
    We propose a new architecture for the learning of predictive spatio-temporal motion models from data alone. Our approach, dubbed the Dropout Autoencoder LSTM, is capable of synthesizing natural looking motion sequences over long time horizons without catastrophic drift or motion degradation. The model consists of two components, a 3-layer recurrent neural network to model temporal aspects and a novel auto-encoder that is trained to implicitly recover the spatial structure of the human skeleton via randomly removing information about joints during training time. This Dropout Autoencoder (D-AE) is then used to filter each predicted pose of the LSTM, reducing accumulation of error and hence drift over time. Furthermore, we propose new evaluation protocols to assess the quality of synthetic motion sequences even for which no ground truth data exists. The proposed protocols can be used to assess generated sequences of arbitrary length. Finally, we evaluate our proposed method on two of the largest motion-capture datasets available to date and show that our model outperforms the state-of-the-art on a variety of actions, including cyclic and acyclic motion, and that it can produce natural looking sequences over longer time horizons than previous methods

    Muscle synergies in neuroscience and robotics: from input-space to task-space perspectives

    Get PDF
    In this paper we review the works related to muscle synergies that have been carried-out in neuroscience and control engineering. In particular, we refer to the hypothesis that the central nervous system (CNS) generates desired muscle contractions by combining a small number of predefined modules, called muscle synergies. We provide an overview of the methods that have been employed to test the validity of this scheme, and we show how the concept of muscle synergy has been generalized for the control of artificial agents. The comparison between these two lines of research, in particular their different goals and approaches, is instrumental to explain the computational implications of the hypothesized modular organization. Moreover, it clarifies the importance of assessing the functional role of muscle synergies: although these basic modules are defined at the level of muscle activations (input-space), they should result in the effective accomplishment of the desired task. This requirement is not always explicitly considered in experimental neuroscience, as muscle synergies are often estimated solely by analyzing recorded muscle activities. We suggest that synergy extraction methods should explicitly take into account task execution variables, thus moving from a perspective purely based on input-space to one grounded on task-space as well

    DART: Distribution Aware Retinal Transform for Event-based Cameras

    Full text link
    We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-features classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101). (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) For overcoming the low-sample problem for the one-shot learning of a binary classifier, statistical bootstrapping is leveraged with online learning; (ii) To achieve tracker robustness, the scale and rotation equivariance property of the DART descriptors is exploited for the one-shot learning. (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset. (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201

    Physical Behavior in Older Persons during Daily Life: Insights from Instrumented Shoes.

    Get PDF
    Activity level and gait parameters during daily life are important indicators for clinicians because they can provide critical insights into modifications of mobility and function over time. Wearable activity monitoring has been gaining momentum in daily life health assessment. Consequently, this study seeks to validate an algorithm for the classification of daily life activities and to provide a detailed gait analysis in older adults. A system consisting of an inertial sensor combined with a pressure sensing insole has been developed. Using an algorithm that we previously validated during a semi structured protocol, activities in 10 healthy elderly participants were recorded and compared to a wearable reference system over a 4 h recording period at home. Detailed gait parameters were calculated from inertial sensors. Dynamics of physical behavior were characterized using barcodes that express the measure of behavioral complexity. Activity classification based on the algorithm led to a 93% accuracy in classifying basic activities of daily life, i.e., sitting, standing, and walking. Gait analysis emphasizes the importance of metrics such as foot clearance in daily life assessment. Results also underline that measures of physical behavior and gait performance are complementary, especially since gait parameters were not correlated to complexity. Participants gave positive feedback regarding the use of the instrumented shoes. These results extend previous observations in showing the concurrent validity of the instrumented shoes compared to a body-worn reference system for daily-life physical behavior monitoring in older adults

    Locomotion Traces Data Mining for Supporting Frail People with Cognitive Impairment

    Get PDF
    The rapid increase in the senior population is posing serious challenges to national healthcare systems. Hence, innovative tools are needed to early detect health issues, including cognitive decline. Several clinical studies show that it is possible to identify cognitive impairment based on the locomotion patterns of older people. Thus, this thesis at first focused on providing a systematic literature review of locomotion data mining systems for supporting Neuro-Degenerative Diseases (NDD) diagnosis, identifying locomotion anomaly indicators and movement patterns for discovering low-level locomotion indicators, sensor data acquisition, and processing methods, as well as NDD detection algorithms considering their pros and cons. Then, we investigated the use of sensor data and Deep Learning (DL) to recognize abnormal movement patterns in instrumented smart-homes. In order to get rid of the noise introduced by indoor constraints and activity execution, we introduced novel visual feature extraction methods for locomotion data. Our solutions rely on locomotion traces segmentation, image-based extraction of salient features from locomotion segments, and vision-based DL. Furthermore, we proposed a data augmentation strategy to increase the volume of collected data and generalize the solution to different smart-homes with different layouts. We carried out extensive experiments with a large real-world dataset acquired in a smart-home test-bed from older people, including people with cognitive diseases. Experimental comparisons show that our system outperforms state-of-the-art methods

    Learning object behaviour models

    Get PDF
    The human visual system is capable of interpreting a remarkable variety of often subtle, learnt, characteristic behaviours. For instance we can determine the gender of a distant walking figure from their gait, interpret a facial expression as that of surprise, or identify suspicious behaviour in the movements of an individual within a car-park. Machine vision systems wishing to exploit such behavioural knowledge have been limited by the inaccuracies inherent in hand-crafted models and the absence of a unified framework for the perception of powerful behaviour models. The research described in this thesis attempts to address these limitations, using a statistical modelling approach to provide a framework in which detailed behavioural knowledge is acquired from the observation of long image sequences. The core of the behaviour modelling framework is an optimised sample-set representation of the probability density in a behaviour space defined by a novel temporal pattern formation strategy. This representation of behaviour is both concise and accurate and facilitates the recognition of actions or events and the assessment of behaviour typicality. The inclusion of generative capabilities is achieved via the addition of a learnt stochastic process model, thus facilitating the generation of predictions and realistic sample behaviours. Experimental results demonstrate the acquisition of behaviour models and suggest a variety of possible applications, including automated visual surveillance, object tracking, gesture recognition, and the generation of realistic object behaviours within animations, virtual worlds, and computer generated film sequences. The utility of the behaviour modelling framework is further extended through the modelling of object interaction. Two separate approaches are presented, and a technique is developed which, using learnt models of joint behaviour together with a stochastic tracking algorithm, can be used to equip a virtual object with the ability to interact in a natural way. Experimental results demonstrate the simulation of a plausible virtual partner during interaction between a user and the machine
    corecore