40 research outputs found

    Learning Monocular 3D Human Pose Estimation from Multi-view Images

    Full text link
    Accurate 3D human pose estimation from single images is possible with sophisticated deep-net architectures that have been trained on very large datasets. However, this still leaves open the problem of capturing motions for which no such database exists. Manual annotation is tedious, slow, and error-prone. In this paper, we propose to replace most of the annotations by the use of multiple views, at training time only. Specifically, we train the system to predict the same pose in all views. Such a consistency constraint is necessary but not sufficient to predict accurate poses. We therefore complement it with a supervised loss aiming to predict the correct pose in a small set of labeled images, and with a regularization term that penalizes drift from initial predictions. Furthermore, we propose a method to estimate camera pose jointly with human pose, which lets us utilize multi-view footage where calibration is difficult, e.g., for pan-tilt or moving handheld cameras. We demonstrate the effectiveness of our approach on established benchmarks, as well as on a new Ski dataset with rotating cameras and expert ski motion, for which annotations are truly hard to obtain.Comment: CVPR 2018, Ski-Pose PTZ-Camera Dataset availabl

    MONITORING GERAKAN SHALAT MELALUI KAMERA DENGAN METODE POSE PREDICT

    Get PDF
      Worship is a spiritual activity that is routinely carried out as a form of respect to God, this worship itself has various forms and ways to do it, one of which is prayer where this prayer is an activity of respecting God and also praying. This prayer also has several requirements and procedures that must be followed such as doing purification or self-cleaning by doing ablution then reading intentions and prayers during the prayer process, etc.To facilitate the worship process, efforts have been made to assist humans in the worship process such as reminding prayer times and also helping to inform worship activities. But as humans there will always be new problems, one of which is memory because humans have a condition where the brain is overworked so humans experience a state of forgetting or not remembering something. This tools using webcam as input data and the computer portable or laptop to process the image and show the output text, to detect the pose this tools using mediapipe library to helping object detection

    Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

    Full text link
    In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image. Our key idea is to utilize the fact that predictions from different views of the same or similar objects should be consistent with each other. Such view consistency can provide effective regularization for keypoint prediction on unlabeled instances. In addition, we introduce a geometric alignment term to regularize predictions in the target domain. The resulting loss function can be effectively optimized via alternating minimization. We demonstrate the effectiveness of our approach on real datasets and present experimental results showing that our approach is superior to state-of-the-art general-purpose domain adaptation techniques.Comment: ECCV 201

    In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

    Get PDF
    Convolutional Neural Network based approaches for monocular 3D human pose estimation usually require a large amount of training images with 3D pose annotations. While it is feasible to provide 2D joint annotations for large corpora of in-the-wild images with humans, providing accurate 3D annotations to such in-the-wild corpora is hardly feasible in practice. Most existing 3D labelled data sets are either synthetically created or feature in-studio images. 3D pose estimation algorithms trained on such data often have limited ability to generalize to real world scene diversity. We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. It has a network architecture that comprises a new disentangled hidden space encoding of explicit 2D and 3D features, and uses supervision by a new learned projection model from predicted 3D pose. Our algorithm can be jointly trained on image data with 3D labels and image data with only 2D labels. It achieves state-of-the-art accuracy on challenging in-the-wild data.Comment: Accepted to CVPR 201

    In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

    No full text
    Convolutional Neural Network based approaches for monocular 3D human pose estimation usually require a large amount of training images with 3D pose annotations. While it is feasible to provide 2D joint annotations for large corpora of in-the-wild images with humans, providing accurate 3D annotations to such in-the-wild corpora is hardly feasible in practice. Most existing 3D labelled data sets are either synthetically created or feature in-studio images. 3D pose estimation algorithms trained on such data often have limited ability to generalize to real world scene diversity. We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. It has a network architecture that comprises a new disentangled hidden space encoding of explicit 2D and 3D features, and uses supervision by a new learned projection model from predicted 3D pose. Our algorithm can be jointly trained on image data with 3D labels and image data with only 2D labels. It achieves state-of-the-art accuracy on challenging in-the-wild data
    corecore