6 research outputs found
SSDB-Net: a single-step dual branch network for weakly supervised semantic segmentation of food images
Food image segmentation, as a critical task in food and nutrition research, promotes the development of various application domains such as calorie and nutrition estimation, food recommender systems, and daily food monitoring systems. Currently, most of the research is focused on food and non-food segmentation, which simply segments the food and background regions. Differently, semantic food segmentation can identify different specific food ingredients in a food image and provide more detailed and accurate information such as object location, shape and class. This is a more challenging but meaningful task, because the same food may appear in completely different colours, shapes and textures in different dishes, and correspondingly less researched. From the implementation perspective, most previous research is based on deep learning methods with pixel-level labelled data. However, annotating pixel-level labels requires extremely high labour costs. In this paper, a novel single-step dual branch network (SSDB-Net) is proposed to achieve weakly supervised semantic food segmentation. To our knowledge, this research is the first time proposing weakly supervised semantic food segmentation with image-level labels based on convolutional neural networks (CNN). It may serve as a benchmark for future food segmentation research. Our proposal method resulted in an mIoU of 14.79% for 104 categories in the FoodSeg103 dataset compared to 11.49% of the state-of-the-art WSSS method applied in food domains
Vision-language models boost food composition compilation
Nutrition information plays a pillar role in clinical dietary practice,
precision nutrition, and food industry. Currently, food composition compilation
serves as a standard paradigm to estimate food nutrition information according
to food ingredient information. However, within this paradigm, conventional
approaches are laborious and highly dependent on the experience of data
managers, they cannot keep pace with the dynamic consumer market and resulting
in lagging and missing nutrition data and earlier machine learning methods
unable to fully understand food ingredient statement information or ignored the
characteristic of food image. To this end, we developed a novel vision-language
AI model, UMDFood-VL, using front-of-package labeling and product images to
accurately estimate food composition profiles. In order to drive such large
model training, we established UMDFood-90k, the most comprehensive multimodal
food database to date. The UMDFood-VL model significantly outperformed
convolutional neural networks (CNNs) and recurrent neural networks (RNNs) on a
variety of nutrition value estimations. For instance, we achieved macro-AUCROC
up to 0.921 for fat value estimation, which satisfied the practice requirement
of food composition compilation. This performance shed the light to generalize
to other food and nutrition-related data compilation and catalyzed the
evolution of other food applications.Comment: 31 pages, 5 figure
The Nutritional Content of Meal Images in Free-Living Conditions-Automatic Assessment with goFOODTM.
A healthy diet can help to prevent or manage many important conditions and diseases, particularly obesity, malnutrition, and diabetes. Recent advancements in artificial intelligence and smartphone technologies have enabled applications to conduct automatic nutritional assessment from meal images, providing a convenient, efficient, and accurate method for continuous diet evaluation. We now extend the goFOODTM automatic system to perform food segmentation, recognition, volume, as well as calorie and macro-nutrient estimation from single images that are captured by a smartphone. In order to assess our system's performance, we conducted a feasibility study with 50 participants from Switzerland. We recorded their meals for one day and then dietitians carried out a 24 h recall. We retrospectively analysed the collected images to assess the nutritional content of the meals. By comparing our results with the dietitians' estimations, we demonstrated that the newly introduced system has comparable energy and macronutrient estimation performance with the previous method; however, it only requires a single image instead of two. The system can be applied in a real-life scenarios, and it can be easily used to assess dietary intake. This system could help individuals gain a better understanding of their dietary consumption. Additionally, it could serve as a valuable resource for dietitians, and could contribute to nutritional research
Power efficient machine learning-based hardware architectures for biomedical applications
The future of critical health diagnosis will involve intelligent and smart devices that are low-cost, wearable, and lightweight, requiring low-power, energy-efficient hardware platforms. Various machine learning models, such as deep learning architectures, have been employed to design intelligent healthcare systems. However, deploying these sophisticated and intelligent devices in real-time embedded systems with limited hardware resources and power budget is complex due to the requirement of high computational power in achieving a high accuracy rate. As a result, this creates a significant gap between the advancement of computing technology and the associated device technologies for healthcare applications. Power-efficient machine learning-based digital hardware design techniques have been introduced in this work for the realization of a compact design solution while maintaining optimal prediction accuracy. Two hardware design approaches, DeepSAC and SABiNN have been proposed and analyzed in this work. DeepSAC is a shift-accumulator-based technique, whereas SABiNN is a 2's complement-based binarized digital hardware technique. Neural network models, such as feedforward, convolutional neural nets, residual networks, and other popular machine learning and deep neural networks, are selected to benchmark the proposed model architecture. Various deep compression learning techniques, such as pruning, n-bit (n = 8,16) integer quantization, and binarization on hyper-parameters, are also employed. These models significantly reduced the power consumption rate by 5x, size by 13x, and improved the model latency. For efficient use of these models, especially in biomedical applications, a sleep apnea (SA) detection device for adults is developed to detect SA events in real-time. The input to the system consists of two physiological sensor data, such as ECG signal from the chest movement and SpO2 measurement from the pulse oximeter to predict the occurrence of SA episodes. In the training phase, actual patient data is used, and the network model is converted into the proposed hardware models to achieve medically backed accuracy. After achieving acceptable results of 88 percent accuracy, all the parameters are extracted for inference on edge. In the inference phase, reconfigurable hardware validated the extracted parameter for model precision and power consumption rate before being translated onto the silicon. This research implements the final model in CMOS platforms using 130 nm and 180 nm commercial CMOS processes.Includes bibliographical references