3,548 research outputs found
Egocentric vision-based passive dietary intake monitoring
Egocentric (first-person) perception captures and reveals how people perceive their surroundings. This unique perceptual view enables passive and objective monitoring of human-centric activities and behaviours. In capturing egocentric visual data, wearable cameras are used. Recent advances in wearable technologies have enabled wearable cameras to be lightweight, accurate, and with long battery life, making long-term passive monitoring a promising solution for healthcare and human behaviour understanding. In addition, recent progress in deep learning has provided an opportunity to accelerate the development of passive methods to enable pervasive and accurate monitoring, as well as comprehensive modelling of human-centric behaviours.
This thesis investigates and proposes innovative egocentric technologies for passive dietary intake monitoring and human behaviour analysis.
Compared to conventional dietary assessment methods in nutritional epidemiology, such as 24-hour dietary recall (24HR) and food frequency questionnaires (FFQs), which heavily rely on subjects’ memory to recall the dietary intake, and trained dietitians to collect, interpret, and analyse the dietary data, passive dietary intake monitoring can ease such burden and provide more accurate and objective assessment of dietary intake. Egocentric vision-based passive monitoring uses wearable cameras to continuously record human-centric activities with a close-up view. This passive way of monitoring does not require active participation from the subject, and records rich spatiotemporal details for fine-grained analysis. Based on egocentric vision and passive dietary intake monitoring, this thesis proposes: 1) a novel network structure called PAR-Net to achieve accurate food recognition by mining discriminative food regions. PAR-Net has been evaluated with food intake images captured by wearable cameras as well as those non-egocentric food images to validate its effectiveness for food recognition; 2) a deep learning-based solution for recognising consumed food items as well as counting the number of bites taken by the subjects from egocentric videos in an end-to-end manner; 3) in light of privacy concerns in egocentric data, this thesis also proposes a privacy-preserved solution for passive dietary intake monitoring, which uses image captioning techniques to summarise the image content and subsequently combines image captioning with 3D container reconstruction to report the actual food volume consumed. Furthermore, a novel framework that integrates food recognition, hand tracking and face recognition has also been developed to tackle the challenge of assessing individual dietary intake in food sharing scenarios with the use of a panoramic camera. Extensive experiments have been conducted. Tested with both laboratory (captured in London) and field study data (captured in Africa), the above proposed solutions have proven the feasibility and accuracy of using the egocentric camera technologies with deep learning methods for individual dietary assessment and human behaviour analysis.Open Acces
DPF-Nutrition: Food Nutrition Estimation via Depth Prediction and Fusion
A reasonable and balanced diet is essential for maintaining good health. With
the advancements in deep learning, automated nutrition estimation method based
on food images offers a promising solution for monitoring daily nutritional
intake and promoting dietary health. While monocular image-based nutrition
estimation is convenient, efficient, and economical, the challenge of limited
accuracy remains a significant concern. To tackle this issue, we proposed
DPF-Nutrition, an end-to-end nutrition estimation method using monocular
images. In DPF-Nutrition, we introduced a depth prediction module to generate
depth maps, thereby improving the accuracy of food portion estimation.
Additionally, we designed an RGB-D fusion module that combined monocular images
with the predicted depth information, resulting in better performance for
nutrition estimation. To the best of our knowledge, this was the pioneering
effort that integrated depth prediction and RGB-D fusion techniques in food
nutrition estimation. Comprehensive experiments performed on Nutrition5k
evaluated the effectiveness and efficiency of DPF-Nutrition
CaloriNet: From silhouettes to calorie estimation in private environments
We propose a novel deep fusion architecture, CaloriNet, for the online
estimation of energy expenditure for free living monitoring in private
environments, where RGB data is discarded and replaced by silhouettes. Our
fused convolutional neural network architecture is trainable end-to-end, to
estimate calorie expenditure, using temporal foreground silhouettes alongside
accelerometer data. The network is trained and cross-validated on a publicly
available dataset, SPHERE_RGBD + Inertial_calorie. Results show
state-of-the-art minimum error on the estimation of energy expenditure
(calories per minute), outperforming alternative, standard and single-modal
techniques.Comment: 11 pages, 7 figure
Partially Supervised Multi-Task Network for Single-View Dietary Assessment
Food volume estimation is an essential step in the pipeline of dietary
assessment and demands the precise depth estimation of the food surface and
table plane. Existing methods based on computer vision require either
multi-image input or additional depth maps, reducing convenience of
implementation and practical significance. Despite the recent advances in
unsupervised depth estimation from a single image, the achieved performance in
the case of large texture-less areas needs to be improved. In this paper, we
propose a network architecture that jointly performs geometric understanding
(i.e., depth prediction and 3D plane estimation) and semantic prediction on a
single food image, enabling a robust and accurate food volume estimation
regardless of the texture characteristics of the target plane. For the training
of the network, only monocular videos with semantic ground truth are required,
while the depth map and 3D plane ground truth are no longer needed.
Experimental results on two separate food image databases demonstrate that our
method performs robustly on texture-less scenarios and is superior to
unsupervised networks and structure from motion based approaches, while it
achieves comparable performance to fully-supervised methods
Deep Cooking: Predicting Relative Food Ingredient Amounts from Images
In this paper, we study the novel problem of not only predicting ingredients
from a food image, but also predicting the relative amounts of the detected
ingredients. We propose two prediction-based models using deep learning that
output sparse and dense predictions, coupled with important semi-automatic
multi-database integrative data pre-processing, to solve the problem.
Experiments on a dataset of recipes collected from the Internet show the models
generate encouraging experimental results
- …