20 research outputs found
Where am I eating? Image-based food menu recognition
Treballs Finals de Grau d'Enginyeria Informà tica, Facultat de Matemà tiques, Universitat de Barcelona, Any: 2018, Director: Marc Bolaños Solà [en] Food has become a very important aspect of our social activities. Since social networks and websites like Yelp appeared, their users have started uploading photos of their meals to the Internet. This factor leads to the development of food analysis models and food recognition.
We propose a model to recognize the meal appearing in a picture from a list of menu items (candidates dishes). Which could serve for the recognize the selected meal in a restaurant. The system presented in this thesis does not need to train a new model for every new restaurant in a real case scenario. It learns to identify the components of an image and the relationship that they have with the name of the meal.
The system introduced in this work computes the similarity between an image and a text sequence, which represents the name of the dish. The pictures are encoded using a combination of Convolutional Neural Networks to reduce the input image. While, the text is converted to a single vector applying a Long Short Term Memory network. These two vectors are compared and optimized using a similarity function.
The similarity-based output is then used as a ranking algorithm for finding the most probable item in a menu list. According to the Ranking Loss metric, the results obtained by the model improve the baseline by a 15%
CuisineNet: Food Attributes Classification using Multi-scale Convolution Network
Diversity of food and its attributes represents the culinary habits of
peoples from different countries. Thus, this paper addresses the problem of
identifying food culture of people around the world and its flavor by
classifying two main food attributes, cuisine and flavor. A deep learning model
based on multi-scale convotuional networks is proposed for extracting more
accurate features from input images. The aggregation of multi-scale convolution
layers with different kernel size is also used for weighting the features
results from different scales. In addition, a joint loss function based on
Negative Log Likelihood (NLL) is used to fit the model probability to multi
labeled classes for multi-modal classification task. Furthermore, this work
provides a new dataset for food attributes, so-called Yummly48K, extracted from
the popular food website, Yummly. Our model is assessed on the constructed
Yummly48K dataset. The experimental results show that our proposed method
yields 65% and 62% average F1 score on validation and test set which
outperforming the state-of-the-art models.Comment: 8 pages, Submitted in CCIA 201
Inverse Cooking: Recipe Generation from Food Images
People enjoy food photography because they appreciate food. Behind each meal
there is a story described in a complex recipe and, unfortunately, by simply
looking at a food image we do not have access to its preparation process.
Therefore, in this paper we introduce an inverse cooking system that recreates
cooking recipes given food images. Our system predicts ingredients as sets by
means of a novel architecture, modeling their dependencies without imposing any
order, and then generates cooking instructions by attending to both image and
its inferred ingredients simultaneously. We extensively evaluate the whole
system on the large-scale Recipe1M dataset and show that (1) we improve
performance w.r.t. previous baselines for ingredient prediction; (2) we are
able to obtain high quality recipes by leveraging both image and ingredients;
(3) our system is able to produce more compelling recipes than retrieval-based
approaches according to human judgment. We make code and models publicly
available.Comment: CVPR 201
Recipe1M: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of
over one million cooking recipes and 13 million food images. As the largest
publicly available collection of recipe data, Recipe1M affords the ability to
train high-capacity models on aligned, multi-modal data. Using these data, we
train a neural network to learn a joint embedding of recipes and images that
yields impressive results on an image-recipe retrieval task. Moreover, we
demonstrate that regularization via the addition of a high-level classification
objective both improves retrieval performance to rival that of humans and
enables semantic vector arithmetic. We postulate that these embeddings will
provide a basis for further exploration of the Recipe1M dataset and food and
cooking in general. Code, data and models are publicly available.Comment: Submitted to Transactions on Pattern Analysis and Machine
Intelligenc