Diversity of food and its attributes represents the culinary habits of
peoples from different countries. Thus, this paper addresses the problem of
identifying food culture of people around the world and its flavor by
classifying two main food attributes, cuisine and flavor. A deep learning model
based on multi-scale convotuional networks is proposed for extracting more
accurate features from input images. The aggregation of multi-scale convolution
layers with different kernel size is also used for weighting the features
results from different scales. In addition, a joint loss function based on
Negative Log Likelihood (NLL) is used to fit the model probability to multi
labeled classes for multi-modal classification task. Furthermore, this work
provides a new dataset for food attributes, so-called Yummly48K, extracted from
the popular food website, Yummly. Our model is assessed on the constructed
Yummly48K dataset. The experimental results show that our proposed method
yields 65% and 62% average F1 score on validation and test set which
outperforming the state-of-the-art models.Comment: 8 pages, Submitted in CCIA 201