229,905 research outputs found
Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding
Recent trends in image understanding have pushed for holistic scene
understanding models that jointly reason about various tasks such as object
detection, scene recognition, shape analysis, contextual reasoning, and local
appearance based classifiers. In this work, we are interested in understanding
the roles of these different tasks in improved scene understanding, in
particular semantic segmentation, object detection and scene recognition.
Towards this goal, we "plug-in" human subjects for each of the various
components in a state-of-the-art conditional random field model. Comparisons
among various hybrid human-machine CRFs give us indications of how much "head
room" there is to improve scene understanding by focusing research efforts on
various individual tasks
A Novel Approach to Multimedia Ontology Engineering for Automated Reasoning over Audiovisual LOD Datasets
Multimedia reasoning, which is suitable for, among others, multimedia content
analysis and high-level video scene interpretation, relies on the formal and
comprehensive conceptualization of the represented knowledge domain. However,
most multimedia ontologies are not exhaustive in terms of role definitions, and
do not incorporate complex role inclusions and role interdependencies. In fact,
most multimedia ontologies do not have a role box at all, and implement only a
basic subset of the available logical constructors. Consequently, their
application in multimedia reasoning is limited. To address the above issues,
VidOnt, the very first multimedia ontology with SROIQ(D) expressivity and a
DL-safe ruleset has been introduced for next-generation multimedia reasoning.
In contrast to the common practice, the formal grounding has been set in one of
the most expressive description logics, and the ontology validated with
industry-leading reasoners, namely HermiT and FaCT++. This paper also presents
best practices for developing multimedia ontologies, based on my ontology
engineering approach
CINet: A Learning Based Approach to Incremental Context Modeling in Robots
There have been several attempts at modeling context in robots. However,
either these attempts assume a fixed number of contexts or use a rule-based
approach to determine when to increment the number of contexts. In this paper,
we pose the task of when to increment as a learning problem, which we solve
using a Recurrent Neural Network. We show that the network successfully (with
98\% testing accuracy) learns to predict when to increment, and demonstrate, in
a scene modeling problem (where the correct number of contexts is not known),
that the robot increments the number of contexts in an expected manner (i.e.,
the entropy of the system is reduced). We also present how the incremental
model can be used for various scene reasoning tasks.Comment: The first two authors have contributed equally, 6 pages, 8 figures,
International Conference on Intelligent Robots (IROS 2018
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
- …