Search CORE

1,414 research outputs found

Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization

Author: Aakur Sathyanarayanan N.
de Souza Fillipe DM
Sarkar Sudeep
Publication venue
Publication date: 15/11/2018
Field of study

A deeper understanding of video activities extends beyond recognition of underlying concepts such as actions and objects: constructing deep semantic representations requires reasoning about the semantic relationships among these concepts, often beyond what is directly observed in the data. To this end, we propose an energy minimization framework that leverages large-scale commonsense knowledge bases, such as ConceptNet, to provide contextual cues to establish semantic relationships among entities directly hypothesized from video signal. We mathematically express this using the language of Grenander's canonical pattern generator theory. We show that the use of prior encoded commonsense knowledge alleviate the need for large annotated training datasets and help tackle imbalance in training through prior knowledge. Using three different publicly available datasets - Charades, Microsoft Visual Description Corpus and Breakfast Actions datasets, we show that the proposed model can generate video interpretations whose quality is better than those reported by state-of-the-art approaches, which have substantial training needs. Through extensive experiments, we show that the use of commonsense knowledge from ConceptNet allows the proposed approach to handle various challenges such as training data imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201

arXiv.org e-Print Archive

Crossref

Automatic Multimedia Creation Enriched with Dynamic Conceptual Data

Author: Aginako Naiara
Alberdi Ion
Iribas Haritz
Martín Angel
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 11/12/2019
Field of study

There is a growing gap between the multimedia production and the context centric multimedia services. The main problem is the under-exploitation of the content creation design. The idea is to support dynamic content generation adapted to the user or display profile. Our work is an implementation of a web platform for automatic generation of multimedia presentations based on SMIL (Synchronized Multimedia Integration Language) standard. The system is able to produce rich media with dynamic multimedia content retrieved automatically from different content databases matching the semantic context. For this purpose, we extend the standard interpretation of SMIL tags in order to accomplish a semantic translation of multimedia objects in database queries. This permits services to take benefit of production process to create customized content enhanced with real time information fed from databases. The described system has been successfully deployed to create advanced context centric weather forecasts

Re-UNIR

The ARTICONF approach to decentralized car-sharing

Author: Chakravorty A.
Karadimce A.
Koulouzis S.
Palanisamy A.
Prodan R.
Rubia C.
Saurabh N.
Sefidanoski M.
Zhao Z.
Publication venue: 'Elsevier BV'
Publication date: 01/09/2021
Field of study

International Migration, Integration and Social Cohesion online publications

The ARTICONF approach to decentralized car-sharing

Author: Chakravorty A.
Karadimce A.
Koulouzis S.
Palanisamy A.
Prodan R.
Rubia C.
Saurabh N.
Sefidanoski M.
Zhao Z.
Publication venue: 'Elsevier BV'
Publication date: 01/09/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Author: Liang Paul Pu
Morency Louis-Philippe
Zadeh Amir
Publication venue
Publication date: 07/09/2022
Field of study

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

arXiv.org e-Print Archive

prototypical implementations ; working packages in project phase II

Author: Coskun Gökhan
Heese Ralf
Oldakowski Radoslaw
Paschke Adrian
Rothe Mario
Schäfermeier Ralph
Streibel Olga
Teymourian Kia
Todor Alexandru
Publication venue
Publication date: 01/01/2012
Field of study

In this technical report, we present the concepts and first prototypical imple- mentations of innovative tools and methods for personalized and contextualized (multimedia) search, collaborative ontology evolution, ontology evaluation and cost models, and dynamic access and trends in distributed (semantic) knowledge. The concepts and prototypes are based on the state of art analysis and identified requirements in the CSW report IV

Institutional Repository of the Freie Universität Berlin