Search CORE

13,454 research outputs found

Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings

Author: G Salton
J Lokoč
J Lokoč
J Zahálka
J Zahálka
KU Barthel
L Rossetto
Marcel Worring
Maura Conway
P Bojanowski
Publication venue
Publication date: 07/05/2019
Field of study

In this paper we present a novel interactive multimodal learning system, which facilitates search and exploration in large networks of social multimedia users. It allows the analyst to identify and select users of interest, and to find similar users in an interactive learning setting. Our approach is based on novel multimodal representations of users, words and concepts, which we simultaneously learn by deploying a general-purpose neural embedding model. We show these representations to be useful not only for categorizing users, but also for automatically generating user and community profiles. Inspired by traditional summarization approaches, we create the profiles by selecting diverse and representative content from all available modalities, i.e. the text, image and user modality. The usefulness of the approach is evaluated using artificial actors, which simulate user behavior in a relevance feedback scenario. Multiple experiments were conducted in order to evaluate the quality of our multimodal representations, to compare different embedding strategies, and to determine the importance of different modalities. We demonstrate the capabilities of the proposed approach on two different multimedia collections originating from the violent online extremism forum Stormfront and the microblogging platform Twitter, which are particularly interesting due to the high semantic level of the discussions they feature

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View

Author: Ah-Pine Julien
Clinchant Stéphane
Csurka Gabriela
Publication venue
Publication date: 27/01/2014
Field of study

Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content based multimedia information retrieval. We focus on graph based methods which have proven to provide state-of-the-art performances. We particularly examine two of such methods : cross-media similarities and random walk based scores. From a theoretical viewpoint, we propose a unifying graph based framework which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph based technique for the combination of visual and textual information. We compare cross-media and random walk based results using three different real-world datasets. From a practical standpoint, our extended empirical analysis allow us to provide insights and guidelines about the use of graph based methods for multimodal information fusion in content based multimedia information retrieval.Comment: An extended version of the paper: Visual and Textual Information Fusion in Multimedia Retrieval using Semantic Filtering and Graph based Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM Transactions on Information System

arXiv.org e-Print Archive

CiteSeerX

Exquisitor: Breaking the Interaction Barrier for Exploration of 100 Million Images

Author: Amsaleg Laurent
Guðmundsson Gylfi Þór
Jónsson Björn Thór
Khan Omar Shahbaz
Ragnarsdóttir Hanna
Rudinac Stevan
Worring Marcel
Zahálka Jan
Þorleiksdóttir Þórhildur
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

International audienceIn this demonstration, we present Exquisitor, a media explorer capable of learning user preferences in real-time during interactions with the 99.2 million images of YFCC100M. Exquisitor owes its efficiency to innovations in data representation, compression, and indexing. Exquisitor can complete each interaction round, including learning preferences and presenting the most relevant results, in less than 30 ms using only a single CPU core and modest RAM. In short, Exquisitor can bring large-scale interactive learning to standard desktops and laptops, and even high-end mobile devices

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

The IT University of Copenhagen's Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

HAL-Rennes 1

Exquisitor at the Lifelog Search Challenge 2019

Author: Jónsson Björn Thór
Khan Omar Shahbaz
Rudinac Stevan
Worring Marcel
Zahálka Jan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Crossref

The IT University of Copenhagen's Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Learning Social Image Embedding with Deep Multimodal Attention Networks

Author: He Yueying
Huang Feiran
Li Zhoujun
Mei Tao
Zhang Xiaoming
Zhao Zhonghua
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link information and multimodal contents (e.g., text description, and visual content), simply employing the embedding learnt from network structure or data content results in sub-optimal social image representation. In this paper, we propose a novel social image embedding approach called Deep Multimodal Attention Networks (DMAN), which employs a deep model to jointly embed multimodal contents and link information. Specifically, to effectively capture the correlations between multimodal contents, we propose a multimodal attention network to encode the fine-granularity relation between image regions and textual words. To leverage the network structure for embedding learning, a novel Siamese-Triplet neural network is proposed to model the links among images. With the joint deep model, the learnt embedding can capture both the multimodal contents and the nonlinear network information. Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search. Compared to state-of-the-art image embeddings, our proposed DMAN achieves significant improvement in the tasks of multi-label classification and cross-modal search

arXiv.org e-Print Archive

Crossref

Integration of Exploration and Search: A Case Study of the M3 Model

Author: A Babenko
AM Arigon
B Shneiderman
BÞ Jónsson
Jan Zahalka
L Amsaleg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International audienceEffective support for multimedia analytics applications requires exploration and search to be integrated seamlessly into a single interaction model. Media metadata can be seen as defining a multidimensional media space, casting multimedia analytics tasks as exploration, manipulation and augmentation of that space. We present an initial case study of integrating exploration and search within this multidimensional media space. We extend the M3 model, initially proposed as a pure exploration tool, and show that it can be elegantly extended to allow searching within an exploration context and exploring within a search context. We then evaluate the suitability of relational database management systems, as representatives of today’s data management technologies, for implementing the extended M3 model. Based on our results, we finally propose some research directions for scalability of multimedia analytics

Crossref

INRIA a CCSD electronic archive server

The IT University of Copenhagen's Repository

HAL-Rennes 1

Combining Language and Vision with a Multimodal Skip-gram Model

Author: Baroni Marco
Lazaridou Angeliki
Pham Nghia The
Publication venue
Publication date: 01/01/2015
Field of study

We extend the SKIP-GRAM model of Mikolov et al. (2013a) by taking visual information into account. Like SKIP-GRAM, our multimodal models (MMSKIP-GRAM) build vector-based word representations by learning to predict linguistic contexts in text corpora. However, for a restricted set of words, the models are also exposed to visual representations of the objects they denote (extracted from natural images), and must predict linguistic and visual features jointly. The MMSKIP-GRAM models achieve good performance on a variety of semantic benchmarks. Moreover, since they propagate visual information to all words, we use them to improve image labeling and retrieval in the zero-shot setup, where the test concepts are never seen during model training. Finally, the MMSKIP-GRAM models discover intriguing visual properties of abstract words, paving the way to realistic implementations of embodied theories of meaning.Comment: accepted at NAACL 2015, camera ready version, 11 page

arXiv.org e-Print Archive

Crossref