102 research outputs found
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Class-Incremental Learning (CIL) trains a model to continually recognize new
classes from non-stationary data while retaining learned knowledge. A major
challenge of CIL arises when applying to real-world data characterized by
non-uniform distribution, which introduces a dual imbalance problem involving
(i) disparities between stored exemplars of old tasks and new class data
(inter-phase imbalance), and (ii) severe class imbalances within each
individual task (intra-phase imbalance). We show that this dual imbalance issue
causes skewed gradient updates with biased weights in FC layers, thus inducing
over/under-fitting and catastrophic forgetting in CIL. Our method addresses it
by reweighting the gradients towards balanced optimization and unbiased
classifier learning. Additionally, we observe imbalanced forgetting where
paradoxically the instance-rich classes suffer higher performance degradation
during CIL due to a larger amount of training data becoming unavailable in
subsequent learning phases. To tackle this, we further introduce a
distribution-aware knowledge distillation loss to mitigate forgetting by
aligning output logits proportionally with the distribution of lost training
data. We validate our method on CIFAR-100, ImageNetSubset, and Food101 across
various evaluation protocols and demonstrate consistent improvements compared
to existing works, showing great potential to apply CIL in real-world scenarios
with enhanced robustness and effectiveness.Comment: Accepted to CVPR 202
An Improved Encoder-Decoder Framework for Food Energy Estimation
Dietary assessment is essential to maintaining a healthy lifestyle. Automatic
image-based dietary assessment is a growing field of research due to the
increasing prevalence of image capturing devices (e.g. mobile phones). In this
work, we estimate food energy from a single monocular image, a difficult task
due to the limited hard-to-extract amount of energy information present in an
image. To do so, we employ an improved encoder-decoder framework for energy
estimation; the encoder transforms the image into a representation embedded
with food energy information in an easier-to-extract format, which the decoder
then extracts the energy information from. To implement our method, we compile
a high-quality food image dataset verified by registered dietitians containing
eating scene images, food-item segmentation masks, and ground truth calorie
values. Our method improves upon previous caloric estimation methods by over
10\% and 30 kCal in terms of MAPE and MAE respectively.Comment: Accepted for Madima'23 in ACM Multimedi
Online Class-Incremental Learning For Real-World Food Image Classification
Food image classification is essential for monitoring health and tracking
dietary in image-based dietary assessment methods. However, conventional
systems often rely on static datasets with fixed classes and uniform
distribution. In contrast, real-world food consumption patterns, shaped by
cultural, economic, and personal influences, involve dynamic and evolving data.
Thus, require the classification system to cope with continuously evolving
data. Online Class Incremental Learning (OCIL) addresses the challenge of
learning continuously from a single-pass data stream while adapting to the new
knowledge and reducing catastrophic forgetting. Experience Replay (ER) based
OCIL methods store a small portion of previous data and have shown encouraging
performance. However, most existing OCIL works assume that the distribution of
encountered data is perfectly balanced, which rarely happens in real-world
scenarios. In this work, we explore OCIL for real-world food image
classification by first introducing a probabilistic framework to simulate
realistic food consumption scenarios. Subsequently, we present an attachable
Dynamic Model Update (DMU) module designed for existing ER methods, which
enables the selection of relevant images for model training, addressing
challenges arising from data repetition and imbalanced sample occurrences
inherent in realistic food consumption patterns within the OCIL framework. Our
performance evaluation demonstrates significant enhancements compared to
established ER methods, showing great potential for lifelong learning in
real-world food image classification scenarios. The code of our method is
publicly accessible at
https://gitlab.com/viper-purdue/OCIL-real-world-food-image-classificationComment: Accepted at IEEE/CVF Winter Conference on Applications of Computer
Vision (WACV 2024
Personalized Food Image Classification: Benchmark Datasets and New Baseline
Food image classification is a fundamental step of image-based dietary
assessment, enabling automated nutrient analysis from food images. Many current
methods employ deep neural networks to train on generic food image datasets
that do not reflect the dynamism of real-life food consumption patterns, in
which food images appear sequentially over time, reflecting the progression of
what an individual consumes. Personalized food classification aims to address
this problem by training a deep neural network using food images that reflect
the consumption pattern of each individual. However, this problem is
under-explored and there is a lack of benchmark datasets with individualized
food consumption patterns due to the difficulty in data collection. In this
work, we first introduce two benchmark personalized datasets including the
Food101-Personal, which is created based on surveys of daily dietary patterns
from participants in the real world, and the VFNPersonal, which is developed
based on a dietary study. In addition, we propose a new framework for
personalized food image classification by leveraging self-supervised learning
and temporal image feature information. Our method is evaluated on both
benchmark datasets and shows improved performance compared to existing works.
The dataset has been made available at:
https://skynet.ecn.purdue.edu/~pan161/dataset_personal.htmlComment: Accepted by IEEE Asilomar conference (2023
Muti-Stage Hierarchical Food Classification
Food image classification serves as a fundamental and critical step in
image-based dietary assessment, facilitating nutrient intake analysis from
captured food images. However, existing works in food classification
predominantly focuses on predicting 'food types', which do not contain direct
nutritional composition information. This limitation arises from the inherent
discrepancies in nutrition databases, which are tasked with associating each
'food item' with its respective information. Therefore, in this work we aim to
classify food items to align with nutrition database. To this end, we first
introduce VFN-nutrient dataset by annotating each food image in VFN with a food
item that includes nutritional composition information. Such annotation of food
items, being more discriminative than food types, creates a hierarchical
structure within the dataset. However, since the food item annotations are
solely based on nutritional composition information, they do not always show
visual relations with each other, which poses significant challenges when
applying deep learning-based techniques for classification. To address this
issue, we then propose a multi-stage hierarchical framework for food item
classification by iteratively clustering and merging food items during the
training process, which allows the deep model to extract image features that
are discriminative across labels. Our method is evaluated on VFN-nutrient
dataset and achieve promising results compared with existing work in terms of
both food type and food item classification.Comment: accepted for ACM MM 2023 Madim
Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs
Recent studies reveal a significant theoretical link between variational
autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to
estimate the theoretical upper bound of the information rate-distortion
function of images. Such estimated theoretical bounds substantially exceed the
performance of existing neural image codecs (NICs). To narrow this gap, we
propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The
proposed BG-VAE leverages the theoretical bound to guide the NIC model towards
enhanced performance. We implement the BG-VAE using Hierarchical VAEs and
demonstrate its effectiveness through extensive experiments. Along with
advanced neural network blocks, we provide a versatile, variable-rate NIC that
outperforms existing methods when considering both rate-distortion performance
and computational complexity. The code is available at BG-VAE.Comment: 2024 IEEE International Conference on Multimedia and Expo (ICME2024
- …