10 research outputs found
3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training
Estimating 3D poses from a monocular video is still a challenging task,
despite the significant progress that has been made in recent years. Generally,
the performance of existing methods drops when the target person is too
small/large, or the motion is too fast/slow relative to the scale and speed of
the training data. Moreover, to our knowledge, many of these methods are not
designed or trained under severe occlusion explicitly, making their performance
on handling occlusion compromised. Addressing these problems, we introduce a
spatio-temporal network for robust 3D human pose estimation. As humans in
videos may appear in different scales and have various motion speeds, we apply
multi-scale spatial features for 2D joints or keypoints prediction in each
individual frame, and multi-stride temporal convolutional net-works (TCNs) to
estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal
discriminator based on body structures as well as limb motions to assess
whether the predicted pose forms a valid pose and a valid movement. During
training, we explicitly mask out some keypoints to simulate various occlusion
cases, from minor to severe occlusion, so that our network can learn better and
becomes robust to various degrees of occlusion. As there are limited 3D
ground-truth data, we further utilize 2D video data to inject a semi-supervised
learning capability to our network. Experiments on public datasets validate the
effectiveness of our method, and our ablation studies show the strengths of our
network\'s individual submodules.Comment: 8 pages, AAAI 202
Physics perception in sloshing scenes with guaranteed thermodynamic consistency
Physics perception very often faces the problem that only limited data or
partial measurements on the scene are available. In this work, we propose a
strategy to learn the full state of sloshing liquids from measurements of the
free surface. Our approach is based on recurrent neural networks (RNN) that
project the limited information available to a reduced-order manifold so as to
not only reconstruct the unknown information, but also to be capable of
performing fluid reasoning about future scenarios in real time. To obtain
physically consistent predictions, we train deep neural networks on the
reduced-order manifold that, through the employ of inductive biases, ensure the
fulfillment of the principles of thermodynamics. RNNs learn from history the
required hidden information to correlate the limited information with the
latent space where the simulation occurs. Finally, a decoder returns data back
to the high-dimensional manifold, so as to provide the user with insightful
information in the form of augmented reality. This algorithm is connected to a
computer vision system to test the performance of the proposed methodology with
real information, resulting in a system capable of understanding and predicting
future states of the observed fluid in real-time.Comment: 20 pages, 11 figure
Análise cinemática automática usando openpose e dynamic time warping com aplicações no remo
Trabalho de Conclusão de Curso (graduação)—Universidade de BrasÃlia, Faculdade UnB Gama, Engenharia Eletrônica, 2019.Este trabalho propõe um sistema de baixo custo para analisar automaticamente parâmetros cinemáticos no remo, a partir da captura e processamento de vÃdeo, usando uma
única câmera RGB e sem a necessidade de marcadores no corpo do indivÃduo. As coordenadas das articulações são estimadas a cada frame usando a API da OpenPose em
conjunto com um filtro offline para contornar as possÃveis perdas de frames e oscilações
na trajetória. Os ângulos das articulações são obtidos por meio das coordenadas em pixels
das articulações estimadas. Suas trajetórias são, então, avaliadas utilizando uma técnica
computacional chamada Dynamic Time Warping, a qual realiza uma comparação entre
duas séries temporais, uma denominada referência e a outra, alvo. A série de referência
consiste em um padrão de remada a ser seguido e é usada como base para avaliar a série
alvo. No teste do sistema compara-se cada remada em um treino de cinco minutos de um
remador iniciante com uma remada de referência, executada por um remador profissional.
Além disso, avalia-se um treino também de cinco minutos do mesmo remador profissional
para conferir a consistência em sua própria remada. Por fim, todas as métricas cinemáticas extraÃdas são exibidas em uma interface para monitorar o movimento do remador e
fornecer um feedback. A abordagem proposta permite a análise automática de sessões de
treinamento gravadas com câmera simples, e pode ser útil para auxiliar na melhoria de
movimento de remadores, principalmente, iniciantes.This work proposes a low cost system to automatically analyze kinematic parameters in
rowing, using video capture and processing, with a single RGB camera and without the
need for markers on the individual’s body. The coordinates of the joints are estimated
in each frame using the OpenPose API together with an offline filter to overcome frame
loss and oscillations in the trajectories. The joint angles are obtained by means of the
pixel coordinates from the estimated joints. Their trajectories are then evaluated using a
computational technique named Dynamic Time Warping, which performs a comparison
between two time series, one denominated reference and the other, target. The reference
series consists of a rowing pattern to be followed and it is used as basis to evaluate the
target series. The system test compares each stroke in a five-minute workout by a novice
rower with a reference stroke, executed by a professional rower. In addition, a five-minute
workout by the same professional rower is evaluated for consistency in his own stroke.
Finally, all extracted kinematic metrics are displayed in an interface to monitor rower
movement and provide feedback. The proposed approach allows automatic analysis of
simple camera recorded training sessions, and could be useful to assist in improving the
movement of rowers, especially unexperienced
Análise cinemática automática usando OpenPose e Dynamic Time Warping com aplicações no remo
Trabalho de Conclusão de Curso (graduação)—Universidade de BrasÃlia, Faculdade UnB Gama, 2019.Este trabalho propõe um sistema de baixo custo para analisar automaticamente parâmetros cinemáticos no remo, a partir da captura e processamento de vÃdeo, usando uma única câmera RGB e sem a necessidade de marcadores no corpo do indivÃduo. As coordenadas das articulações são estimadas a cada frame usando a API da OpenPose em conjunto com um filtro offline para contornar as possÃveis perdas de frames e oscilações na trajetória. Os ângulos das articulações são obtidos por meio das coordenadas em pixels das articulações estimadas. Suas trajetórias são, então, avaliadas utilizando uma técnica computacional chamada Dynamic Time Warping, a qual realiza uma comparação entre duas séries temporais, uma denominada referência e a outra, alvo. A série de referência consiste em um padrão de remada a ser seguido e é usada como base para avaliar a série alvo. No teste do sistema compara-se cada remada em um treino de cinco minutos de um remador iniciante com uma remada de referência, executada por um remador profissional. Além disso, avalia-se um treino também de cinco minutos do mesmo remador profissional para conferir a consistência em sua própria remada. Por fim, todas as métricas cinemáticas extraÃdas são exibidas em uma interface para monitorar o movimento do remador e fornecer um feedback. A abordagem proposta permite a análise automática de sessões de treinamento gravadas com câmera simples, e pode ser útil para auxiliar na melhoria de movimento de remadores, principalmente, iniciantes.This work proposes a low cost system to automatically analyze kinematic parameters in rowing, using video capture and processing, with a single RGB camera and without the need for markers on the individual’s body. The coordinates of the joints are estimated in each frame using the OpenPose API together with an offline filter to overcome frame loss and oscillations in the trajectories. The joint angles are obtained by means of the pixel coordinates from the estimated joints. Their trajectories are then evaluated using a computational technique named Dynamic Time Warping, which performs a comparison between two time series, one denominated reference and the other, target. The reference series consists of a rowing pattern to be followed and it is used as basis to evaluate the target series. The system test compares each stroke in a five-minute workout by a novice rower with a reference stroke, executed by a professional rower. In addition, a five-minute workout by the same professional rower is evaluated for consistency in his own stroke. Finally, all extracted kinematic metrics are displayed in an interface to monitor rower movement and provide feedback. The proposed approach allows automatic analysis of simple camera recorded training sessions, and could be useful to assist in improving the movement of rowers, especially unexperienced
Learning Generalizable Visual Patterns Without Human Supervision
Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in a renaissance in computer vision. However, almost all of the visual data we generate daily - several human lives worth of it - remains unlabeled and thus out of reach of today’s dominant supervised learning paradigm. This thesis focuses on techniques that steer deep models towards learning generalizable visual patterns without human supervision. Our primary tool in this endeavor is the design of Self-Supervised Learning tasks, i.e., pretext-tasks for which labels do not involve human labor. Besides enabling the learning from large amounts of unlabeled data, we demonstrate how self-supervision can capture relevant patterns that supervised learning largely misses. For example, we design learning tasks that learn deep representations capturing shape from images, motion from video, and 3D pose features from multi-view data. Notably, these tasks’ design follows a common principle: The recognition of data transformations. The strong performance of the learned representations on downstream vision tasks such as classification, segmentation, action recognition, or pose estimation validate this pretext-task design.
This thesis also explores the use of Generative Adversarial Networks (GANs) for unsupervised representation learning. Besides leveraging generative adversarial learning to define image transformation for self-supervised learning tasks, we also address training instabilities of GANs through the use of noise.
While unsupervised techniques can significantly reduce the burden of supervision, in the end, we still rely on some annotated examples to fine-tune learned representations towards a target task. To improve the learning from scarce or noisy labels, we describe a supervised learning algorithm with improved generalization in these challenging settings