In this paper, we introduce Recipe1M, a new large-scale, structured corpus of
over one million cooking recipes and 13 million food images. As the largest
publicly available collection of recipe data, Recipe1M affords the ability to
train high-capacity models on aligned, multi-modal data. Using these data, we
train a neural network to learn a joint embedding of recipes and images that
yields impressive results on an image-recipe retrieval task. Moreover, we
demonstrate that regularization via the addition of a high-level classification
objective both improves retrieval performance to rival that of humans and
enables semantic vector arithmetic. We postulate that these embeddings will
provide a basis for further exploration of the Recipe1M dataset and food and
cooking in general. Code, data and models are publicly available.Comment: Submitted to Transactions on Pattern Analysis and Machine
Intelligenc