27,828 research outputs found
Ontology-Driven Food Category Classification in Images
The self-management of chronic diseases related to dietary habits includes the necessity of tracking what people eat. Most of the approaches proposed in the literature classify food pictures by labels describing the whole recipe. The main drawback of this kind of strategy is that a wrong prediction of the recipe leads to a wrong prediction of any ingredient of such a recipe. In this paper we present a multi-label food classification approach, exploiting deep neural networks, where each food picture is classified with labels describing the food categories of the ingredients in each recipe. The aim of our approach is to support the detection of food categories in order to detect which one might be dangerous for a user affected by chronic disease. Our approach relies on background knowledge where recipes, food categories, and their relatedness with chronic diseases are modeled within a state-of-the-art ontology. Experiments conducted on a new publicly released dataset demonstrated the effectiveness of the proposed approach with respect to state-of-the-art classification strategies
Joint Video and Text Parsing for Understanding Events and Answering Queries
We propose a framework for parsing video and text jointly for understanding
events and answering user queries. Our framework produces a parse graph that
represents the compositional structures of spatial information (objects and
scenes), temporal information (actions and events) and causal information
(causalities between events and fluents) in the video and text. The knowledge
representation of our framework is based on a spatial-temporal-causal And-Or
graph (S/T/C-AOG), which jointly models possible hierarchical compositions of
objects, scenes and events as well as their interactions and mutual contexts,
and specifies the prior probabilistic distribution of the parse graphs. We
present a probabilistic generative model for joint parsing that captures the
relations between the input video/text, their corresponding parse graphs and
the joint parse graph. Based on the probabilistic model, we propose a joint
parsing system consisting of three modules: video parsing, text parsing and
joint inference. Video parsing and text parsing produce two parse graphs from
the input video and text respectively. The joint inference module produces a
joint parse graph by performing matching, deduction and revision on the video
and text parse graphs. The proposed framework has the following objectives:
Firstly, we aim at deep semantic parsing of video and text that goes beyond the
traditional bag-of-words approaches; Secondly, we perform parsing and reasoning
across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG
representation; Thirdly, we show that deep joint parsing facilitates subsequent
applications such as generating narrative text descriptions and answering
queries in the forms of who, what, when, where and why. We empirically
evaluated our system based on comparison against ground-truth as well as
accuracy of query answering and obtained satisfactory results
Place Categorization and Semantic Mapping on a Mobile Robot
In this paper we focus on the challenging problem of place categorization and
semantic mapping on a robot without environment-specific training. Motivated by
their ongoing success in various visual recognition tasks, we build our system
upon a state-of-the-art convolutional network. We overcome its closed-set
limitations by complementing the network with a series of one-vs-all
classifiers that can learn to recognize new semantic classes online. Prior
domain knowledge is incorporated by embedding the classification system into a
Bayesian filter framework that also ensures temporal coherence. We evaluate the
classification accuracy of the system on a robot that maps a variety of places
on our campus in real-time. We show how semantic information can boost robotic
object detection performance and how the semantic map can be used to modulate
the robot's behaviour during navigation tasks. The system is made available to
the community as a ROS module
Towards Bottom-Up Analysis of Social Food
in ACM Digital Health Conference 201
- …