Search CORE

266,323 research outputs found

Multimodal Sparse Coding for Event Detection

Author: Brady Kevin
Campbell William
Cha Miriam
Gwon Youngjune
Kung H. T.
Sturim Douglas
Publication venue
Publication date: 17/05/2016
Field of study

Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as GMM supervectors and sparse RBM. We report the cross-validated classification accuracy and mean average precision of the MED system trained on features learned from our unimodal and multimodal settings for a subset of the TRECVID MED 2014 dataset.Comment: Multimodal Machine Learning Workshop at NIPS 201

arXiv.org e-Print Archive

CiteSeerX

Using Visual Journals as a Reflective Worldview Window into Educator Identity

Author: Belcher Christina
Loerts Terry
Publication venue: Digital Commons @ George Fox University
Publication date: 04/03/2020
Field of study

This ethnographic case study research and content analysis presents the conclusion of a three-year study involving 37 teacher candidate participants across a three-year study within a two year (2 semester program) Bachelor of Education program at a university in Ontario, Canada. Each academic year participants were intentionally given time over two semesters of literacy courses to engage in literacy practices and knowledge of self through the use of multimodal visual journals. Candidates reflect on their conceptions of literacy, teaching, identity and worldview within an institution grounded in the Christian faith. Findings, philosophical ponderings and content analysis suggest that the identity of the teacher candidate filters learning through visual and multimodal ways. The findings raise questions about the place of multimodal learning, self-reflection, faith and worldview in the learning process, and in identity formation of educators. We suggest that this study may inform current multimodal and visual literacy research while generating enriching discussions on how multimodal forms of literacy instruction may assist in acknowledgement of worldview recognition and self-identity awareness. Keywords: Multiliteracies, visual journals, self-knowledge, worldview, identity, visual literacy, multimodal literacy, teacher educatio

Digital Commons @ George Fox University

Recommended from our members

Situating multimodal learning analytics

Author: Abrahamson D
Blikstein P
Grover S
Schneider B
Tissenbaum M
Worsley M
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

The digital age has introduced a host of new challenges and opportunities for the learning sciences community. These challenges and opportunities are particularly abundant in multimodal learning analytics (MMLA), a research methodology that aims to extend work from Educational Data Mining (EDM) and Learning Analytics (LA) to multimodal learning environments by treating multimodal data. Recognizing the short-term opportunities and longterm challenges will help develop proof cases and identify grand challenges that will help propel the field forward. To support the field's growth, we use this paper to describe several ways that MMLA can potentially advance learning sciences research and touch upon key challenges that researchers who utilize MMLA have encountered over the past few years

eScholarship - University of California

BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

Author: Ben-younes Hedi
Cadene Rémi
Cord Matthieu
Thome Nicolas
Publication venue
Publication date: 27/01/2019
Field of study

Multimodal representation learning is gaining more and more interest within the deep learning community. While bilinear models provide an interesting framework to find subtle combination of modalities, their number of parameters grows quadratically with the input dimensions, making their practical implementation within classical deep learning pipelines challenging. In this paper, we introduce BLOCK, a new multimodal fusion based on the block-superdiagonal tensor decomposition. It leverages the notion of block-term ranks, which generalizes both concepts of rank and mode ranks for tensors, already used for multimodal fusion. It allows to define new ways for optimizing the tradeoff between the expressiveness and complexity of the fusion model, and is able to represent very fine interactions between modalities while maintaining powerful mono-modal representations. We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities. Through extensive experiments, we show that BLOCK compares favorably with respect to state-of-the-art multimodal fusion models for both VQA and VRD tasks. Our code is available at https://github.com/Cadene/block.bootstrap.pytorch

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Social Image Embedding with Deep Multimodal Attention Networks

Author: He Yueying
Huang Feiran
Li Zhoujun
Mei Tao
Zhang Xiaoming
Zhao Zhonghua
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link information and multimodal contents (e.g., text description, and visual content), simply employing the embedding learnt from network structure or data content results in sub-optimal social image representation. In this paper, we propose a novel social image embedding approach called Deep Multimodal Attention Networks (DMAN), which employs a deep model to jointly embed multimodal contents and link information. Specifically, to effectively capture the correlations between multimodal contents, we propose a multimodal attention network to encode the fine-granularity relation between image regions and textual words. To leverage the network structure for embedding learning, a novel Siamese-Triplet neural network is proposed to model the links among images. With the joint deep model, the learnt embedding can capture both the multimodal contents and the nonlinear network information. Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search. Compared to state-of-the-art image embeddings, our proposed DMAN achieves significant improvement in the tasks of multi-label classification and cross-modal search

arXiv.org e-Print Archive

Crossref