Recovering 3D human mesh in the wild is greatly challenging as in-the-wild
(ITW) datasets provide only 2D pose ground truths (GTs). Recently, 3D
pseudo-GTs have been widely used to train 3D human mesh estimation networks as
the 3D pseudo-GTs enable 3D mesh supervision when training the networks on ITW
datasets. However, despite the great potential of the 3D pseudo-GTs, there has
been no extensive analysis that investigates which factors are important to
make more beneficial 3D pseudo-GTs. In this paper, we provide three recipes to
obtain highly beneficial 3D pseudo-GTs of ITW datasets. The main challenge is
that only 2D-based weak supervision is allowed when obtaining the 3D
pseudo-GTs. Each of our three recipes addresses the challenge in each aspect:
depth ambiguity, sub-optimality of weak supervision, and implausible
articulation. Experimental results show that simply re-training
state-of-the-art networks with our new 3D pseudo-GTs elevates their performance
to the next level without bells and whistles. The 3D pseudo-GT is publicly
available in https://github.com/mks0601/NeuralAnnot_RELEASE.Comment: Published at CVPRW 202