Given a food image, can a fine-grained object recognition engine tell "which
restaurant which dish" the food belongs to? Such ultra-fine grained image
recognition is the key for many applications like search by images, but it is
very challenging because it needs to discern subtle difference between classes
while dealing with the scarcity of training data. Fortunately, the ultra-fine
granularity naturally brings rich relationships among object classes. This
paper proposes a novel approach to exploit the rich relationships through
bipartite-graph labels (BGL). We show how to model BGL in an overall
convolutional neural networks and the resulting system can be optimized through
back-propagation. We also show that it is computationally efficient in
inference thanks to the bipartite structure. To facilitate the study, we
construct a new food benchmark dataset, which consists of 37,885 food images
collected from 6 restaurants and totally 975 menus. Experimental results on
this new food and three other datasets demonstrates BGL advances previous works
in fine-grained object recognition. An online demo is available at
http://www.f-zhou.com/fg_demo/