Search CORE

43 research outputs found

ResSeg: Residual encoder-decoder convolutional neural network for food segmentation

Author: Jiménez-Moreno Robinson
Pachón-Suescún César G.
Pinzón-Arenas Javier O.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2020
Field of study

This paper presents the implementation and evaluation of different convolutional neural network architectures focused on food segmentation. To perform this task, it is proposed the recognition of 6 categories, among which are the main food groups (protein, grains, fruit, vegetables) and two additional groups, rice and drink or juice. In addition, to make the recognition more complex, it is decided to test the networks with food dishes already started, i.e. during different moments, from its serving to its finishing, in order to verify the capability to see when there is no more food on the plate. Finally, a comparison is made between the two best resulting networks, a SegNet with architecture VGG-16 and a network proposed in this work, called Residual Segmentation Convolutional Neural Network or ResSeg, with which accuracies greater than 90% and interception-over-union greater than 75% were obtained. This demonstrates the ability, not only of SegNet architectures for food segmentation, but the use of residual layers to improve the contour of the segmentation and segmentation of complex distribution or initiated of food dishes, opening the field of application of this type of networks to be implemented in feeding assistants or in automated restaurants, including also for dietary control for the amount of food consumed

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Sketch-based Video Object Segmentation: Benchmark and Analysis

Author: Hospedales Timothy
Hu Conghui
Li Da
Song Yi-Zhe
Yang Ruolin
Zhang Honggang
Publication venue
Publication date: 13/11/2023
Field of study

Reference-based video object segmentation is an emerging topic which aims to segment the corresponding target object in each video frame referred by a given reference, such as a language expression or a photo mask. However, language expressions can sometimes be vague in conveying an intended concept and ambiguous when similar objects in one frame are hard to distinguish by language. Meanwhile, photo masks are costly to annotate and less practical to provide in a real application. This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline. Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation. We take advantage of STCN, a popular baseline of semi-supervised VOS task, and evaluate what the most effective design for incorporating a sketch reference is. Experimental results show sketch is more effective yet annotation-efficient than other references, such as photo masks, language and scribble.Comment: BMVC 202

arXiv.org e-Print Archive

RVOS: end-to-end recurrent network for video object segmentation

Author: Bellver Míriam
Girbau Xalabarder Andreu
Giró Nieto Xavier
Marqués Acosta Fernando
Salvador Aguilera Amaia
Ventura Royo Carles
Publication venue
Publication date: 15/06/2019
Field of study

Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.This research was supported by the Spanish Ministry ofEconomy and Competitiveness and the European RegionalDevelopment Fund (TIN2015-66951-C2-2-R, TIN2015-65316-P & TEC2016-75976-R), the BSC-CNS SeveroOchoa SEV-2015-0493 and LaCaixa-Severo Ochoa Inter-national Doctoral Fellowship programs, the 2017 SGR 1414and the Industrial Doctorates 2017-DI-064 & 2017-DI-028from the Government of CataloniaPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC