13,203 research outputs found
A Novel Approach to Multimedia Ontology Engineering for Automated Reasoning over Audiovisual LOD Datasets
Multimedia reasoning, which is suitable for, among others, multimedia content
analysis and high-level video scene interpretation, relies on the formal and
comprehensive conceptualization of the represented knowledge domain. However,
most multimedia ontologies are not exhaustive in terms of role definitions, and
do not incorporate complex role inclusions and role interdependencies. In fact,
most multimedia ontologies do not have a role box at all, and implement only a
basic subset of the available logical constructors. Consequently, their
application in multimedia reasoning is limited. To address the above issues,
VidOnt, the very first multimedia ontology with SROIQ(D) expressivity and a
DL-safe ruleset has been introduced for next-generation multimedia reasoning.
In contrast to the common practice, the formal grounding has been set in one of
the most expressive description logics, and the ontology validated with
industry-leading reasoners, namely HermiT and FaCT++. This paper also presents
best practices for developing multimedia ontologies, based on my ontology
engineering approach
Thick 2D Relations for Document Understanding
We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LATEX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robustness and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%
Learning the Semantics of Manipulation Action
In this paper we present a formal computational framework for modeling
manipulation actions. The introduced formalism leads to semantics of
manipulation action and has applications to both observing and understanding
human manipulation actions as well as executing them with a robotic mechanism
(e.g. a humanoid robot). It is based on a Combinatory Categorial Grammar. The
goal of the introduced framework is to: (1) represent manipulation actions with
both syntax and semantic parts, where the semantic part employs
-calculus; (2) enable a probabilistic semantic parsing schema to learn
the -calculus representation of manipulation action from an annotated
action corpus of videos; (3) use (1) and (2) to develop a system that visually
observes manipulation actions and understands their meaning while it can reason
beyond observations using propositional logic and axiom schemata. The
experiments conducted on a public available large manipulation action dataset
validate the theoretical framework and our implementation
Extending the Foundational Model of Anatomy with Automatically Acquired Spatial Relations
Formal ontologies have made significant impact in bioscience over the last ten years. Among them, the Foundational Model of Anatomy Ontology (FMA) is the most comprehensive model for the spatio-structural representation of human anatomy. In the research project MEDICO we use the FMA as our main source of background knowledge about human anatomy. Our ultimate goals are to use spatial knowledge from the FMA (1) to improve automatic parsing algorithms for 3D volume data sets generated by Computed Tomography and Magnetic Resonance Imaging and (2) to generate semantic annotations using the concepts from the FMA to allow semantic search on medical image repositories. We argue that in this context more spatial relation instances are needed than those currently available in the FMA. In this publication we present a technique for the automatic inductive acquisition of spatial relation instances by generalizing from expert-annotated volume datasets
Layered Interpretation of Street View Images
We propose a layered street view model to encode both depth and semantic
information on street view images for autonomous driving. Recently, stixels,
stix-mantics, and tiered scene labeling methods have been proposed to model
street view images. We propose a 4-layer street view model, a compact
representation over the recently proposed stix-mantics model. Our layers encode
semantic classes like ground, pedestrians, vehicles, buildings, and sky in
addition to the depths. The only input to our algorithm is a pair of stereo
images. We use a deep neural network to extract the appearance features for
semantic classes. We use a simple and an efficient inference algorithm to
jointly estimate both semantic classes and layered depth values. Our method
outperforms other competing approaches in Daimler urban scene segmentation
dataset. Our algorithm is massively parallelizable, allowing a GPU
implementation with a processing speed about 9 fps.Comment: The paper will be presented in the 2015 Robotics: Science and Systems
Conference (RSS
- …