Dietary assessment is essential to maintaining a healthy lifestyle. Automatic
image-based dietary assessment is a growing field of research due to the
increasing prevalence of image capturing devices (e.g. mobile phones). In this
work, we estimate food energy from a single monocular image, a difficult task
due to the limited hard-to-extract amount of energy information present in an
image. To do so, we employ an improved encoder-decoder framework for energy
estimation; the encoder transforms the image into a representation embedded
with food energy information in an easier-to-extract format, which the decoder
then extracts the energy information from. To implement our method, we compile
a high-quality food image dataset verified by registered dietitians containing
eating scene images, food-item segmentation masks, and ground truth calorie
values. Our method improves upon previous caloric estimation methods by over
10\% and 30 kCal in terms of MAPE and MAE respectively.Comment: Accepted for Madima'23 in ACM Multimedi