1,414 research outputs found
Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization
A deeper understanding of video activities extends beyond recognition of
underlying concepts such as actions and objects: constructing deep semantic
representations requires reasoning about the semantic relationships among these
concepts, often beyond what is directly observed in the data. To this end, we
propose an energy minimization framework that leverages large-scale commonsense
knowledge bases, such as ConceptNet, to provide contextual cues to establish
semantic relationships among entities directly hypothesized from video signal.
We mathematically express this using the language of Grenander's canonical
pattern generator theory. We show that the use of prior encoded commonsense
knowledge alleviate the need for large annotated training datasets and help
tackle imbalance in training through prior knowledge. Using three different
publicly available datasets - Charades, Microsoft Visual Description Corpus and
Breakfast Actions datasets, we show that the proposed model can generate video
interpretations whose quality is better than those reported by state-of-the-art
approaches, which have substantial training needs. Through extensive
experiments, we show that the use of commonsense knowledge from ConceptNet
allows the proposed approach to handle various challenges such as training data
imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201
Automatic Multimedia Creation Enriched with Dynamic Conceptual Data
There is a growing gap between the multimedia production and the context centric multimedia services. The main problem is the under-exploitation of the content creation design. The idea is to support dynamic content generation adapted to the user or display profile. Our work is an implementation of a web platform for automatic generation of multimedia presentations based on SMIL (Synchronized Multimedia Integration Language) standard. The system is able to produce rich media with dynamic multimedia content retrieved automatically from different content databases matching the semantic context. For this purpose, we extend the standard interpretation of SMIL tags in order to accomplish a semantic translation of multimedia objects in database queries. This permits services to take benefit of production process to create customized content enhanced with real time information fed from databases. The described system has been successfully deployed to create advanced context centric weather forecasts
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
prototypical implementations ; working packages in project phase II
In this technical report, we present the concepts and first prototypical
imple- mentations of innovative tools and methods for personalized and
contextualized (multimedia) search, collaborative ontology evolution, ontology
evaluation and cost models, and dynamic access and trends in distributed
(semantic) knowledge. The concepts and prototypes are based on the state of
art analysis and identified requirements in the CSW report IV
- …