We study the problem of generating a training-free task-dependent visual
classifier from text descriptions without visual samples. This
\textit{Text-to-Model} (T2M) problem is closely related to zero-shot learning,
but unlike previous work, a T2M model infers a model tailored to a task, taking
into account all classes in the task. We analyze the symmetries of T2M, and
characterize the equivariance and invariance properties of corresponding
models. In light of these properties, we design an architecture based on
hypernetworks that given a set of new class descriptions predicts the weights
for an object recognition model which classifies images from those zero-shot
classes. We demonstrate the benefits of our approach compared to zero-shot
learning from text descriptions in image and point-cloud classification using
various types of text descriptions: From single words to rich text
descriptions