Muscle-actuated organisms are capable of learning an unparalleled diversity
of dexterous movements despite their vast amount of muscles. Reinforcement
learning (RL) on large musculoskeletal models, however, has not been able to
show similar performance. We conjecture that ineffective exploration in large
overactuated action spaces is a key problem. This is supported by the finding
that common exploration noise strategies are inadequate in synthetic examples
of overactuated systems. We identify differential extrinsic plasticity (DEP), a
method from the domain of self-organization, as being able to induce
state-space covering exploration within seconds of interaction. By integrating
DEP into RL, we achieve fast learning of reaching and locomotion in
musculoskeletal systems, outperforming current approaches in all considered
tasks in sample efficiency and robustness