Search CORE

26 research outputs found

Visual Concept Detection in Images and Videos

Author: Mühling Markus
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2014
Field of study

The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

TRECVID 2014 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics

Author: Awad George
Fiscus Jon
Joy David
Kraaij Wessel
Michel Martial
Over Paul
Quénot Georges
Sanders Greg
Smeaton Alan,
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceThe TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last dozen years this effort has yielded a better under- standing of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST with support from other US government agencies. Many organizations and individuals worldwide contribute significant time and effort

Hal - Université Grenoble Alpes

Objects2action: Classifying and localizing actions without any video example

Author: Jain Mihir
Mensink Thomas
Snoek Cees G. M.
van Gemert Jan C.
Publication venue
Publication date: 01/01/2015
Field of study

The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings

Author: G Salton
J Lokoč
J Lokoč
J Zahálka
J Zahálka
KU Barthel
L Rossetto
Marcel Worring
Maura Conway
P Bojanowski
Publication venue
Publication date: 07/05/2019
Field of study

In this paper we present a novel interactive multimodal learning system, which facilitates search and exploration in large networks of social multimedia users. It allows the analyst to identify and select users of interest, and to find similar users in an interactive learning setting. Our approach is based on novel multimodal representations of users, words and concepts, which we simultaneously learn by deploying a general-purpose neural embedding model. We show these representations to be useful not only for categorizing users, but also for automatically generating user and community profiles. Inspired by traditional summarization approaches, we create the profiles by selecting diverse and representative content from all available modalities, i.e. the text, image and user modality. The usefulness of the approach is evaluated using artificial actors, which simulate user behavior in a relevance feedback scenario. Multiple experiments were conducted in order to evaluate the quality of our multimodal representations, to compare different embedding strategies, and to determine the importance of different modalities. We demonstrate the capabilities of the proposed approach on two different multimedia collections originating from the violent online extremism forum Stormfront and the microblogging platform Twitter, which are particularly interesting due to the high semantic level of the discussions they feature

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

TRECVID 2015 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics

Author: Aly Robin
Awad George
Fiscus Jon
Joy David
Kraaij Wessel
Michel Martial
Ordelman Roeland
Over Paul
Quénot Georges
Smeaton Alan,
Publication venue: HAL CCSD
Publication date: 16/11/2015
Field of study

International audienc

Multimodal Classification of Violent Online Political Extremism Content with Graph Convolutional Networks

Author: Gornishka I.
Rudinac S.
Worring M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Multimodal Classification of Violent Online Political Extremism Content with Graph Convolutional Networks

Author: Gornishka I.
Rudinac S.
Worring M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Finding Semantically Related Videos in Closed Collections

Author: A Argyriou
CGM Snoek
Christos Tzelepis
Christos Tzelepis
DG Lowe
F Markatopoulou
F Markatopoulou
G Csurka
Herbert Bay
Jia Deng
KEA Sande Van de
LE Sucar
M Baumgartner
MF Weng
Nikiforos Pittaras
O Russakovsky
P Dollár
P. Sidiropoulos
V Ferrari
X Wang
X Zhao
Y Wei
Y Yang
Zhanpeng Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning

Crossref

Queen Mary Research Online