20 research outputs found

    Where am I eating? Image-based food menu recognition

    Get PDF
    Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2018, Director: Marc Bolaños Solà[en] Food has become a very important aspect of our social activities. Since social networks and websites like Yelp appeared, their users have started uploading photos of their meals to the Internet. This factor leads to the development of food analysis models and food recognition. We propose a model to recognize the meal appearing in a picture from a list of menu items (candidates dishes). Which could serve for the recognize the selected meal in a restaurant. The system presented in this thesis does not need to train a new model for every new restaurant in a real case scenario. It learns to identify the components of an image and the relationship that they have with the name of the meal. The system introduced in this work computes the similarity between an image and a text sequence, which represents the name of the dish. The pictures are encoded using a combination of Convolutional Neural Networks to reduce the input image. While, the text is converted to a single vector applying a Long Short Term Memory network. These two vectors are compared and optimized using a similarity function. The similarity-based output is then used as a ranking algorithm for finding the most probable item in a menu list. According to the Ranking Loss metric, the results obtained by the model improve the baseline by a 15%

    CuisineNet: Food Attributes Classification using Multi-scale Convolution Network

    Full text link
    Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.Comment: 8 pages, Submitted in CCIA 201

    Inverse Cooking: Recipe Generation from Food Images

    Get PDF
    People enjoy food photography because they appreciate food. Behind each meal there is a story described in a complex recipe and, unfortunately, by simply looking at a food image we do not have access to its preparation process. Therefore, in this paper we introduce an inverse cooking system that recreates cooking recipes given food images. Our system predicts ingredients as sets by means of a novel architecture, modeling their dependencies without imposing any order, and then generates cooking instructions by attending to both image and its inferred ingredients simultaneously. We extensively evaluate the whole system on the large-scale Recipe1M dataset and show that (1) we improve performance w.r.t. previous baselines for ingredient prediction; (2) we are able to obtain high quality recipes by leveraging both image and ingredients; (3) our system is able to produce more compelling recipes than retrieval-based approaches according to human judgment. We make code and models publicly available.Comment: CVPR 201

    Recipe1M: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

    Get PDF
    In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to learn a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Moreover, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.Comment: Submitted to Transactions on Pattern Analysis and Machine Intelligenc
    corecore