10 research outputs found

    3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training

    Full text link
    Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional net-works (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground-truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Experiments on public datasets validate the effectiveness of our method, and our ablation studies show the strengths of our network\'s individual submodules.Comment: 8 pages, AAAI 202

    Physics perception in sloshing scenes with guaranteed thermodynamic consistency

    Get PDF
    Physics perception very often faces the problem that only limited data or partial measurements on the scene are available. In this work, we propose a strategy to learn the full state of sloshing liquids from measurements of the free surface. Our approach is based on recurrent neural networks (RNN) that project the limited information available to a reduced-order manifold so as to not only reconstruct the unknown information, but also to be capable of performing fluid reasoning about future scenarios in real time. To obtain physically consistent predictions, we train deep neural networks on the reduced-order manifold that, through the employ of inductive biases, ensure the fulfillment of the principles of thermodynamics. RNNs learn from history the required hidden information to correlate the limited information with the latent space where the simulation occurs. Finally, a decoder returns data back to the high-dimensional manifold, so as to provide the user with insightful information in the form of augmented reality. This algorithm is connected to a computer vision system to test the performance of the proposed methodology with real information, resulting in a system capable of understanding and predicting future states of the observed fluid in real-time.Comment: 20 pages, 11 figure

    Análise cinemática automática usando openpose e dynamic time warping com aplicações no remo

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Faculdade UnB Gama, Engenharia Eletrônica, 2019.Este trabalho propõe um sistema de baixo custo para analisar automaticamente parâmetros cinemáticos no remo, a partir da captura e processamento de vídeo, usando uma única câmera RGB e sem a necessidade de marcadores no corpo do indivíduo. As coordenadas das articulações são estimadas a cada frame usando a API da OpenPose em conjunto com um filtro offline para contornar as possíveis perdas de frames e oscilações na trajetória. Os ângulos das articulações são obtidos por meio das coordenadas em pixels das articulações estimadas. Suas trajetórias são, então, avaliadas utilizando uma técnica computacional chamada Dynamic Time Warping, a qual realiza uma comparação entre duas séries temporais, uma denominada referência e a outra, alvo. A série de referência consiste em um padrão de remada a ser seguido e é usada como base para avaliar a série alvo. No teste do sistema compara-se cada remada em um treino de cinco minutos de um remador iniciante com uma remada de referência, executada por um remador profissional. Além disso, avalia-se um treino também de cinco minutos do mesmo remador profissional para conferir a consistência em sua própria remada. Por fim, todas as métricas cinemáticas extraídas são exibidas em uma interface para monitorar o movimento do remador e fornecer um feedback. A abordagem proposta permite a análise automática de sessões de treinamento gravadas com câmera simples, e pode ser útil para auxiliar na melhoria de movimento de remadores, principalmente, iniciantes.This work proposes a low cost system to automatically analyze kinematic parameters in rowing, using video capture and processing, with a single RGB camera and without the need for markers on the individual’s body. The coordinates of the joints are estimated in each frame using the OpenPose API together with an offline filter to overcome frame loss and oscillations in the trajectories. The joint angles are obtained by means of the pixel coordinates from the estimated joints. Their trajectories are then evaluated using a computational technique named Dynamic Time Warping, which performs a comparison between two time series, one denominated reference and the other, target. The reference series consists of a rowing pattern to be followed and it is used as basis to evaluate the target series. The system test compares each stroke in a five-minute workout by a novice rower with a reference stroke, executed by a professional rower. In addition, a five-minute workout by the same professional rower is evaluated for consistency in his own stroke. Finally, all extracted kinematic metrics are displayed in an interface to monitor rower movement and provide feedback. The proposed approach allows automatic analysis of simple camera recorded training sessions, and could be useful to assist in improving the movement of rowers, especially unexperienced

    Análise cinemática automática usando OpenPose e Dynamic Time Warping com aplicações no remo

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Faculdade UnB Gama, 2019.Este trabalho propõe um sistema de baixo custo para analisar automaticamente parâmetros cinemáticos no remo, a partir da captura e processamento de vídeo, usando uma única câmera RGB e sem a necessidade de marcadores no corpo do indivíduo. As coordenadas das articulações são estimadas a cada frame usando a API da OpenPose em conjunto com um filtro offline para contornar as possíveis perdas de frames e oscilações na trajetória. Os ângulos das articulações são obtidos por meio das coordenadas em pixels das articulações estimadas. Suas trajetórias são, então, avaliadas utilizando uma técnica computacional chamada Dynamic Time Warping, a qual realiza uma comparação entre duas séries temporais, uma denominada referência e a outra, alvo. A série de referência consiste em um padrão de remada a ser seguido e é usada como base para avaliar a série alvo. No teste do sistema compara-se cada remada em um treino de cinco minutos de um remador iniciante com uma remada de referência, executada por um remador profissional. Além disso, avalia-se um treino também de cinco minutos do mesmo remador profissional para conferir a consistência em sua própria remada. Por fim, todas as métricas cinemáticas extraídas são exibidas em uma interface para monitorar o movimento do remador e fornecer um feedback. A abordagem proposta permite a análise automática de sessões de treinamento gravadas com câmera simples, e pode ser útil para auxiliar na melhoria de movimento de remadores, principalmente, iniciantes.This work proposes a low cost system to automatically analyze kinematic parameters in rowing, using video capture and processing, with a single RGB camera and without the need for markers on the individual’s body. The coordinates of the joints are estimated in each frame using the OpenPose API together with an offline filter to overcome frame loss and oscillations in the trajectories. The joint angles are obtained by means of the pixel coordinates from the estimated joints. Their trajectories are then evaluated using a computational technique named Dynamic Time Warping, which performs a comparison between two time series, one denominated reference and the other, target. The reference series consists of a rowing pattern to be followed and it is used as basis to evaluate the target series. The system test compares each stroke in a five-minute workout by a novice rower with a reference stroke, executed by a professional rower. In addition, a five-minute workout by the same professional rower is evaluated for consistency in his own stroke. Finally, all extracted kinematic metrics are displayed in an interface to monitor rower movement and provide feedback. The proposed approach allows automatic analysis of simple camera recorded training sessions, and could be useful to assist in improving the movement of rowers, especially unexperienced

    Learning Generalizable Visual Patterns Without Human Supervision

    Get PDF
    Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in a renaissance in computer vision. However, almost all of the visual data we generate daily - several human lives worth of it - remains unlabeled and thus out of reach of today’s dominant supervised learning paradigm. This thesis focuses on techniques that steer deep models towards learning generalizable visual patterns without human supervision. Our primary tool in this endeavor is the design of Self-Supervised Learning tasks, i.e., pretext-tasks for which labels do not involve human labor. Besides enabling the learning from large amounts of unlabeled data, we demonstrate how self-supervision can capture relevant patterns that supervised learning largely misses. For example, we design learning tasks that learn deep representations capturing shape from images, motion from video, and 3D pose features from multi-view data. Notably, these tasks’ design follows a common principle: The recognition of data transformations. The strong performance of the learned representations on downstream vision tasks such as classification, segmentation, action recognition, or pose estimation validate this pretext-task design. This thesis also explores the use of Generative Adversarial Networks (GANs) for unsupervised representation learning. Besides leveraging generative adversarial learning to define image transformation for self-supervised learning tasks, we also address training instabilities of GANs through the use of noise. While unsupervised techniques can significantly reduce the burden of supervision, in the end, we still rely on some annotated examples to fine-tune learned representations towards a target task. To improve the learning from scarce or noisy labels, we describe a supervised learning algorithm with improved generalization in these challenging settings

    3D Human Pose Machines with Self-supervised Learning

    No full text
    corecore