266,323 research outputs found
Multimodal Sparse Coding for Event Detection
Unsupervised feature learning methods have proven effective for
classification tasks based on a single modality. We present multimodal sparse
coding for learning feature representations shared across multiple modalities.
The shared representations are applied to multimedia event detection (MED) and
evaluated in comparison to unimodal counterparts, as well as other feature
learning methods such as GMM supervectors and sparse RBM. We report the
cross-validated classification accuracy and mean average precision of the MED
system trained on features learned from our unimodal and multimodal settings
for a subset of the TRECVID MED 2014 dataset.Comment: Multimodal Machine Learning Workshop at NIPS 201
Using Visual Journals as a Reflective Worldview Window into Educator Identity
This ethnographic case study research and content analysis presents the conclusion of a three-year study involving 37 teacher candidate participants across a three-year study within a two year (2 semester program) Bachelor of Education program at a university in Ontario, Canada. Each academic year participants were intentionally given time over two semesters of literacy courses to engage in literacy practices and knowledge of self through the use of multimodal visual journals. Candidates reflect on their conceptions of literacy, teaching, identity and worldview within an institution grounded in the Christian faith. Findings, philosophical ponderings and content analysis suggest that the identity of the teacher candidate filters learning through visual and multimodal ways. The findings raise questions about the place of multimodal learning, self-reflection, faith and worldview in the learning process, and in identity formation of educators. We suggest that this study may inform current multimodal and visual literacy research while generating enriching discussions on how multimodal forms of literacy instruction may assist in acknowledgement of worldview recognition and self-identity awareness.
Keywords: Multiliteracies, visual journals, self-knowledge, worldview, identity, visual literacy, multimodal literacy, teacher educatio
Recommended from our members
Situating multimodal learning analytics
The digital age has introduced a host of new challenges and opportunities for the learning sciences community. These challenges and opportunities are particularly abundant in multimodal learning analytics (MMLA), a research methodology that aims to extend work from Educational Data Mining (EDM) and Learning Analytics (LA) to multimodal learning environments by treating multimodal data. Recognizing the short-term opportunities and longterm challenges will help develop proof cases and identify grand challenges that will help propel the field forward. To support the field's growth, we use this paper to describe several ways that MMLA can potentially advance learning sciences research and touch upon key challenges that researchers who utilize MMLA have encountered over the past few years
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
Multimodal representation learning is gaining more and more interest within
the deep learning community. While bilinear models provide an interesting
framework to find subtle combination of modalities, their number of parameters
grows quadratically with the input dimensions, making their practical
implementation within classical deep learning pipelines challenging. In this
paper, we introduce BLOCK, a new multimodal fusion based on the
block-superdiagonal tensor decomposition. It leverages the notion of block-term
ranks, which generalizes both concepts of rank and mode ranks for tensors,
already used for multimodal fusion. It allows to define new ways for optimizing
the tradeoff between the expressiveness and complexity of the fusion model, and
is able to represent very fine interactions between modalities while
maintaining powerful mono-modal representations. We demonstrate the practical
interest of our fusion model by using BLOCK for two challenging tasks: Visual
Question Answering (VQA) and Visual Relationship Detection (VRD), where we
design end-to-end learnable architectures for representing relevant
interactions between modalities. Through extensive experiments, we show that
BLOCK compares favorably with respect to state-of-the-art multimodal fusion
models for both VQA and VRD tasks. Our code is available at
https://github.com/Cadene/block.bootstrap.pytorch
Learning Social Image Embedding with Deep Multimodal Attention Networks
Learning social media data embedding by deep models has attracted extensive
research interest as well as boomed a lot of applications, such as link
prediction, classification, and cross-modal search. However, for social images
which contain both link information and multimodal contents (e.g., text
description, and visual content), simply employing the embedding learnt from
network structure or data content results in sub-optimal social image
representation. In this paper, we propose a novel social image embedding
approach called Deep Multimodal Attention Networks (DMAN), which employs a deep
model to jointly embed multimodal contents and link information. Specifically,
to effectively capture the correlations between multimodal contents, we propose
a multimodal attention network to encode the fine-granularity relation between
image regions and textual words. To leverage the network structure for
embedding learning, a novel Siamese-Triplet neural network is proposed to model
the links among images. With the joint deep model, the learnt embedding can
capture both the multimodal contents and the nonlinear network information.
Extensive experiments are conducted to investigate the effectiveness of our
approach in the applications of multi-label classification and cross-modal
search. Compared to state-of-the-art image embeddings, our proposed DMAN
achieves significant improvement in the tasks of multi-label classification and
cross-modal search
- …
