9 research outputs found

    Probability-Based Dynamic Time Warping for Gesture Recognition on RGB-D Data

    Get PDF
    Dynamic Time Warping (DTW) is commonly used in gesture recognition tasks in order to tackle the temporal length variability of gestures. In the DTW framework, a set of gesture patterns are compared one by one to a maybe infinite test sequence, and a query gesture category is recognized if a warping cost below a certain threshold is found within the test sequence. Nevertheless, either taking one single sample per gesture category or a set of isolated samples may not encode the variability of such gesture category. In this paper, a probability-based DTW for gesture recognition is proposed. Different samples of the same gesture pattern obtained from RGB-Depth data are used to build a Gaussian-based probabilistic model of the gesture. Finally, the cost of DTW has been adapted accordingly to the new model. The proposed approach is tested in a challenging scenario, showing better performance of the probability-based DTW in comparison to state-of-the-art approaches for gesture recognition on RGB-D data

    GESTURE RECOGNITION FOR PENCAK SILAT TAPAK SUCI REAL-TIME ANIMATION

    Get PDF
    The main target in this research is a design of a virtual martial arts training system in real-time and as a tool in learning martial arts independently using genetic algorithm methods and dynamic time warping. In this paper, it is still in the initial stages, which is focused on taking data sets of martial arts warriors using 3D animation and the Kinect sensor cameras, there are 2 warriors x 8 moves x 596 cases/gesture = 9,536 cases. Gesture Recognition Studies are usually distinguished: body gesture and hand and arm gesture, head and face gesture, and, all three can be studied simultaneously in martial arts pencak silat, using martial arts stance detection with scoring methods. Silat movement data is recorded in the form of oni files using the OpenNI ™ (OFW) framework and BVH (Bio Vision Hierarchical) files as well as plug-in support software on Mocap devices. Responsiveness is a measure of time responding to interruptions, and is critical because the system must be able to meet the demand

    Non-Verbal Communication Analysis in Victim-Offender Mediations

    Full text link
    In this paper we present a non-invasive ambient intelligence framework for the semi-automatic analysis of non-verbal communication applied to the restorative justice field. In particular, we propose the use of computer vision and social signal processing technologies in real scenarios of Victim-Offender Mediations, applying feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues from the fields of psychology and observational methodology. We test our methodology on data captured in real world Victim-Offender Mediation sessions in Catalonia in collaboration with the regional government. We define the ground truth based on expert opinions when annotating the observed social responses. Using different state-of-the-art binary classification approaches, our system achieves recognition accuracies of 86% when predicting satisfaction, and 79% when predicting both agreement and receptivity. Applying a regression strategy, we obtain a mean deviation for the predictions between 0.5 and 0.7 in the range [1-5] for the computed social signals.Comment: Please, find the supplementary video material at: http://sunai.uoc.edu/~vponcel/video/VOMSessionSample.mp

    How to Rank Answers in Text Mining

    Get PDF
    In this thesis, we mainly focus on case studies about answers. We present the methodology CEW-DTW and assess its performance about ranking quality. Based on the CEW-DTW, we improve this methodology by combining Kullback-Leibler divergence with CEW-DTW, since Kullback-Leibler divergence can check the difference of probability distributions in two sequences. However, CEW-DTW and KL-CEW-DTW do not care about the effect of noise and keywords from the viewpoint of probability distribution. Therefore, we develop a new methodology, the General Entropy, to see how probabilities of noise and keywords affect answer qualities. We firstly analyze some properties of the General Entropy, such as the value range of the General Entropy. Especially, we try to find an objective goal, which can be regarded as a standard to assess answers. Therefore, we introduce the maximum general entropy. We try to use the general entropy methodology to find an imaginary answer with the maximum entropy from the mathematical viewpoint (though this answer may not exist). This answer can also be regarded as an “ideal” answer. By comparing maximum entropy probabilities and global probabilities of noise and keywords respectively, the maximum entropy probability of noise is smaller than the global probability of noise, maximum entropy probabilities of chosen keywords are larger than global probabilities of keywords in some conditions. This allows us to determinably select the max number of keywords. We also use Amazon dataset and a small group of survey to assess the general entropy. Though these developed methodologies can analyze answer qualities, they do not incorporate the inner connections among keywords and noise. Based on the Markov transition matrix, we develop the Jump Probability Entropy. We still adapt Amazon dataset to compare maximum jump entropy probabilities and global jump probabilities of noise and keywords respectively. Finally, we give steps about how to get answers from Amazon dataset, including obtaining original answers from Amazon dataset, removing stopping words and collinearity. We compare our developed methodologies to see if these methodologies are consistent. Also, we introduce Wald–Wolfowitz runs test and compare it with developed methodologies to verify their relationships. Depending on results of comparison, we get conclusions about consistence of these methodologies and illustrate future plans

    Extending procrustes analysis : building multi-view 2-D models from 3-D human shape samples

    Get PDF
    This dissertation formalizes the construction of multi-view 2D shape models from 3D data. We propose several extensions of the well-known Procrustes Analysis (PA) algorithm that allow modeling rigid and non-rigid transformations in an efficient manner. The proposed strategies are successfully tested on faces and human bodies datasets. In human perception applications one can set physical restrictions, such as defining faces and human skeletons as sets of anatomical landmarks or articulated bodies. However, the high variation of facial expressions and human postures from different viewpoints makes problems like face tracking or human pose estimation extremely challenging. The common approach to handle large viewpoint variations is training the models with several labeled images from different viewpoints. However, this approach has several important drawbacks: (1) it is not clear how much it is necessary to enhance the dataset with images from different viewpoints in order to build unbiased 2D models; (2) extending the training set without this evaluation would unnecessarily increase memory and computation requirements to train the models; and (3) obtaining new labeled images from different viewpoints can be a difficult task because of the expensive labeling cost; finally, (4) a non-uniform coverage of the different viewpoints of a person leads to biased 2D models. In this dissertation we propose successive extensions of PA to address these issues. First of all, we introduce Projected Procrustes Analysis (PPA) as a formalization for building multi-view 2D rigid models from 3D datasets. PPA rotates and projects every 3D training shape and builds a multi-view 2D model from this enhanced training set. We also introduce common parameterizations of rotations, as well as mechanisms to uniformly sample the rotation space. We show that uniformly distributed rotations generate unbiased 2D models, while non-uniform rotations lead to models representing some viewpoints better than others. Although PPA has been successful in building multi-view 2D models, it requires an enhanced dataset that increases the computational requirements in space and time. In order to address these PA and PPA drawbacks, we propose Continuous Procrustes Analysis (CPA). CPA extends PPA within a functional analysis framework and constructs multi-view 2D rigid models in an efficient way through integrating all possible rotations in a given domain. We show that CPA models are inherently unbiased because of their integral formulation. However, CPA is not able to capture non-rigid deformations from the dataset. Next, in order to efficiently compute multi-view 2D deformable models from 3D data, we introduce Subspace Procrustes Analysis (SPA). By adding a subspace in the PA formulation, SPA is able to model non-rigid deformations, as well as rigid 3D transformations of the training set. We developed a discrete (DSPA) and continuous (CSPA) formulation to provide a better understanding of the problem, where DSPA samples and CSPA integrates the 3D rotation space. Finally, we illustrate the benefits of our multi-view 2D deformable models in the task of human pose estimation. We first reformulate the problem as feature selection by subspace matching, and propose an efficient approach for this task. Our method is much more efficient than the state-of-the-art feature selection by subspace matching approaches, and it is able to handle larger number of outliers. Next, we show that our multi-view 2D deformable models, combined with the subspace matching method, outperform state-of-the-art methods of human pose estimation. Our approach is more accurate in the joint positions and limb lengths because we use unbiased 2D models trained on 3D Motion Capture datasets. Our models are not biased to any particular point of view and they can successfully reconstruct different non-rigid deformations and viewpoints. Moreover, they are efficient in both learning and test times.En esta tesis se formaliza la construcción de modelos multivista 2D a partir de datos 3D, a través de varias extensiones del conocido método Procrustes Analysis (PA). Las extensiones propuestas permiten modelar transformaciones rígidas y no rígidas eficientemente, y se han puesto a prueba en bases de datos de caras y cuerpos humanos. Las aplicaciones donde se perciben humanos permiten establecer restricciones físicas, tales como definir caras y esqueletos como conjuntos de puntos anatómicos. Sin embargo, la gran variación que sufren las expresiones faciales y las posturas humanas desde distintos puntos de vista convierten problemas como el seguimiento de caras o la estimación de la postura humana en retos extremadamente complejos. El planteamiento habitual para gestionar grandes variaciones de punto de vista consiste en entrenar los modelos con imágenes etiquetadas tomadas con distintas orientaciones. Sin embargo, este enfoque sufre importantes inconvenientes: (1) no queda claro cuántas imágenes adicionales con distintas orientaciones son necesarias con tal de construir modelos 2D no sesgados por ningún punto de vista; (2) extender el conjunto de datos de entrenamiento sin esta evaluación incrementaría innecesariamente el coste computacional en tiempo y en memoria; (3) obtener nuevas imágenes etiquetadas con distintas orientaciones puede tratarse de una tarea compleja debido al elevado coste del etiquetado manual; finalmente, (4) no cubrir uniformemente los distintos puntos de vista de una persona conduce a modelos sesgados. En esta tesis se proponen sucesivas extensiones de PA para hacer frente a estos problemas. Primero, proponemos Projected Procrustes Analysis (PPA) para formalizar la construcción de modelos rígidos multivista 2D a partir de conjuntos de datos 3D. PPA rota y proyecta cada objeto 3D y construye un modelo 2D a partir de este conjunto de datos enriquecido. También mostramos como rotaciones uniformemente distribuidas generan modelos 2D no sesgados, mientras rotaciones no uniformes conducen a modelos que representan algunos puntos de vista mejor que otros. Aunque PPA construye modelos multivista 2D, necesita un conjunto de entrenamiento enriquecido que incrementa los requisitos computacionales. Para solventar este problema de PA y PPA, proponemos Continuous Procrustes Analysis (CPA). CPA extiende PPA en un marco de análisis funcional y construye modelos rígidos multivista 2D de un modo eficiente, integrando todas las posibles rotaciones en un dominio dado. Mostramos como los modelos generados con CPA son inherentemente no sesgados debido a la formulación integral. Sin embargo, CPA no captura las deformaciones no rígidas de los datos. En consecuencia, proponemos Subspace Procrustes Analysis (SPA) con el objetivo de construir modelos deformables multivista 2D de un modo eficiente a partir de datos 3D. Añadiendo un subespacio a la formulación de PA, SPA es capaz de modelar deformaciones no rígidas, así como transformaciones 3D de los datos. Desarrollamos una formulación discreta (DSPA) y otra continua (CSPA), donde DSPA muestrea y CSPA integra el espacio de rotaciones 3D. Finalmente, ilustramos las ventajas de nuestros modelos deformables multivista 2D en la tarea de estimar la postura humana. Primero reformulamos el problema como una selección de características por subespacio coincidente y proponemos un método para resolver esta tarea eficientemente. Después, mostramos como nuestros modelos multivista 2D, combinados con la selección de características por subespacio coincidente, mejoran el estado del arte de estimación de la pose humana. Nuestro método es más preciso en la posición de las articulaciones y la longitud de las extremidades por el uso de modelos multivista 2D entrenados en bases de datos de captura de movimiento 3D. Nuestros modelos no están sesgados por punto de vista y pueden reconstruir deformaciones rígidas y no rígidas. Además, estos modelos son eficientes tanto en su construcción como en su us
    corecore