Humanoid robots are well suited for human habitats due to their morphological
similarity, but developing controllers for them is a challenging task that
involves multiple sub-problems, such as control, planning and perception. In
this paper, we introduce a method to simplify controller design by enabling
users to train and fine-tune robot control policies using natural language
commands. We first learn a neural network policy that generates behaviors given
a natural language command, such as "walk forward", by combining Large Language
Models (LLMs), motion retargeting, and motion imitation. Based on the
synthesized motion, we iteratively fine-tune by updating the text prompt and
querying LLMs to find the best checkpoint associated with the closest motion in
history. We validate our approach using a simulated Digit humanoid robot and
demonstrate learning of diverse motions, such as walking, hopping, and kicking,
without the burden of complex reward engineering. In addition, we show that our
iterative refinement enables us to learn 3x times faster than a naive
formulation that learns from scratch