14,086 research outputs found
μ¬λ λμμ λ§μ»€μλ μ¬κ΅¬μ±
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2017. 2. μ΄μ ν¬.Markerless human pose recognition using a single-depth camera plays an important role in interactive graphics applications and user interface design. Recent pose recognition algorithms have adopted machine learning techniques, utilizing a large collection of motion capture data. The effectiveness of the algorithms is greatly influenced by the diversity and variability of training data. Many applications have been developed to use human body as a controller to utilize these pose recognition systems. In many cases, using general props help us perform immersion control of the system. Nevertheless, the human pose and prop recognition system is not yet sufficiently powerful. Moreover, there is a problem such as invisible parts lower the quality of human pose estimation from a single depth camera due to an absence of observed data.
In this thesis, we present techniques to manipulate the human motion data for enabling to estimate human pose from a single depth camera. First, we developed method that resamples a collection of human motion data to improve the pose variability and achieve an arbitrary size and level of density in the space of human poses. The space of human poses is high-dimensional and thus brute-force uniform sampling is intractable. We exploit dimensionality reduction and locally stratified sampling to generate either uniform or application-specifically biased distributions in the space of human poses. Our algorithm is learned to recognize such challenging poses such as sit, kneel, stretching and yoga using a remarkably small amount of training data. The recognition algorithm can also be steered to maximize its performance for a specific domain of human poses. We demonstrate that our algorithm performs much better than Kinect SDK for recognizing challenging acrobatic poses, while performing comparably for easy upright standing poses. Second, we find out environmental object which interact with human beings. We proposed a new props recognition system, which can applied on the existing human pose estimation algorithm, and enable to powerful props estimation with human poses at the same times. Our work is widely applicable to various types of controllers system, which deals with the human pose and addition items simultaneously. Finally, we enhance the pose estimation result. All the part of human body cannot be always estimated from the single depth image. In some case, some body parts are occluded by other body parts, and sometimes estimation system fail to success. For solving this problem, we construct novel neural network model which called autoencoder. It is constructed from huge natural pose data. Then it can reconstruct the missing parameter of human pose joint as new correct joint. It can be applied to many different human pose estimation systems to improve their performance.1 Introduction 1
2 Background 9
2.1 Research on Motion Data 9
2.2 Human Pose Estimation 10
2.3 Machine Learning on Human Pose Estimation 11
2.4 Dimension Reduction and Uniform Sampling 12
2.5 Neural Networks on Motion Data 13
3 Markerless Human Pose Recognition System 14
3.1 System Overview 14
3.2 Preprocessing Data Process 15
3.3 Randomized Decision Tree 20
3.4 Joint Estimation Process 22
4 Controllable Sampling Data in the Space of Human Poses 26
4.1 Overview 26
4.2 Locally Stratified Sampling 28
4.3 Experimental Results 34
4.4 Discussion 40
5 Human Pose Estimation with Interacting Prop from Single Depth Image 48
5.1 Introduction 48
5.2 Prop Estimation 50
5.3 Experimental Results 53
5.4 Discussion 57
6 Enhancing the Estimation of Human Pose from Incomplete Joints 58
6.1 Overview 58
6.2 Method 59
6.3 Experimental Result 62
6.4 Discussion 66
7 Conclusion 67
Bibliography 69
μ΄λ‘ 81Docto
HumanMAC: Masked Motion Completion for Human Motion Prediction
Human motion prediction is a classical problem in computer vision and
computer graphics, which has a wide range of practical applications. Previous
effects achieve great empirical performance based on an encoding-decoding
style. The methods of this style work by first encoding previous motions to
latent representations and then decoding the latent representations into
predicted motions. However, in practice, they are still unsatisfactory due to
several issues, including complicated loss constraints, cumbersome training
processes, and scarce switch of different categories of motions in prediction.
In this paper, to address the above issues, we jump out of the foregoing style
and propose a novel framework from a new perspective. Specifically, our
framework works in a masked completion fashion. In the training stage, we learn
a motion diffusion model that generates motions from random noise. In the
inference stage, with a denoising procedure, we make motion prediction
conditioning on observed motions to output more continuous and controllable
predictions. The proposed framework enjoys promising algorithmic properties,
which only needs one loss in optimization and is trained in an end-to-end
manner. Additionally, it accomplishes the switch of different categories of
motions effectively, which is significant in realistic tasks, e.g., the
animation task. Comprehensive experiments on benchmarks confirm the superiority
of the proposed framework. The project page is available at
https://lhchen.top/Human-MAC
A data augmentation methodology for training machine/deep learning gait recognition algorithms
There are several confounding factors that can reduce the accuracy of gait recognition systems. These factors can reduce the distinctiveness, or alter the features used to characterise gait; they include variations in clothing, lighting, pose and environment, such as the walking surface. Full invariance to all confounding factors is challenging in the absence of high-quality labelled training data. We introduce a simulation-based methodology and a subject-specific dataset which can be used for generating synthetic video frames and sequences for data augmentation. With this methodology, we generated a multi-modal dataset. In addition, we supply simulation files that provide the ability to simultaneously sample from several confounding variables. The basis of the data is real motion capture data of subjects walking and running on a treadmill at different speeds. Results from gait recognition experiments suggest that information about the identity of subjects is retained within synthetically generated examples. The dataset and methodology allow studies into fully-invariant identity recognition spanning a far greater number of observation conditions than would otherwise be possible
Semi-supervised FusedGAN for Conditional Image Generation
We present FusedGAN, a deep network for conditional image synthesis with
controllable sampling of diverse images. Fidelity, diversity and controllable
sampling are the main quality measures of a good image generation model. Most
existing models are insufficient in all three aspects. The FusedGAN can perform
controllable sampling of diverse images with very high fidelity. We argue that
controllability can be achieved by disentangling the generation process into
various stages. In contrast to stacked GANs, where multiple stages of GANs are
trained separately with full supervision of labeled intermediate images, the
FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike
existing methods, which requires full supervision with paired conditions and
images, the FusedGAN can effectively leverage more abundant images without
corresponding conditions in training, to produce more diverse samples with high
fidelity. We achieve this by fusing two generators: one for unconditional image
generation, and the other for conditional image generation, where the two
partly share a common latent space thereby disentangling the generation. We
demonstrate the efficacy of the FusedGAN in fine grained image generation tasks
such as text-to-image, and attribute-to-face generation
- β¦