408 research outputs found
Common Representation Learning Using Step-based Correlation Multi-Modal CNN
Deep learning techniques have been successfully used in learning a common
representation for multi-view data, wherein the different modalities are
projected onto a common subspace. In a broader perspective, the techniques used
to investigate common representation learning falls under the categories of
canonical correlation-based approaches and autoencoder based approaches. In
this paper, we investigate the performance of deep autoencoder based methods on
multi-view data. We propose a novel step-based correlation multi-modal CNN
(CorrMCNN) which reconstructs one view of the data given the other while
increasing the interaction between the representations at each hidden layer or
every intermediate step. Finally, we evaluate the performance of the proposed
model on two benchmark datasets - MNIST and XRMB. Through extensive
experiments, we find that the proposed model achieves better performance than
the current state-of-the-art techniques on joint common representation learning
and transfer learning tasks.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Deep CCA is a recently proposed deep neural network extension to the
traditional canonical correlation analysis (CCA), and has been successful for
multi-view representation learning in several domains. However, stochastic
optimization of the deep CCA objective is not straightforward, because it does
not decouple over training examples. Previous optimizers for deep CCA are
either batch-based algorithms or stochastic optimization using large
minibatches, which can have high memory consumption. In this paper, we tackle
the problem of stochastic optimization for deep CCA with small minibatches,
based on an iterative solution to the CCA objective, and show that we can
achieve as good performance as previous optimizers and thus alleviate the
memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and
Computin
Decoding Kinematic Information From Primary Motor Cortex Ensemble Activities Using a Deep Canonical Correlation Analysis
The control of arm movements through intracortical brain-machine interfaces (BMIs) mainly relies on the activities of the primary motor cortex (M1) neurons and mathematical models that decode their activities. Recent research on decoding process attempts to not only improve the performance but also simultaneously understand neural and behavioral relationships. In this study, we propose an efficient decoding algorithm using a deep canonical correlation analysis (DCCA), which maximizes correlations between canonical variables with the non-linear approximation of mappings from neuronal to canonical variables via deep learning. We investigate the effectiveness of using DCCA for finding a relationship between M1 activities and kinematic information when non-human primates performed a reaching task with one arm. Then, we examine whether using neural activity representations from DCCA improves the decoding performance through linear and non-linear decoders: a linear Kalman filter (LKF) and a long short-term memory in recurrent neural networks (LSTM-RNN). We found that neural representations of M1 activities estimated by DCCA resulted in more accurate decoding of velocity than those estimated by linear canonical correlation analysis, principal component analysis, factor analysis, and linear dynamical system. Decoding with DCCA yielded better performance than decoding the original FRs using LSTM-RNN (6.6 and 16.0% improvement on average for each velocity and position, respectively; Wilcoxon rank sum test, p < 0.05). Thus, DCCA can identify the kinematics-related canonical variables of M1 activities, thus improving the decoding performance. Our results may help advance the design of decoding models for intracortical BMIs
Learning Deep Latent Spaces for Multi-Label Classification
Multi-label classification is a practical yet challenging task in machine
learning related fields, since it requires the prediction of more than one
label category for each input instance. We propose a novel deep neural networks
(DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this
task. Aiming at better relating feature and label domain data for improved
classification, we uniquely perform joint feature and label embedding by
deriving a deep latent space, followed by the introduction of label-correlation
sensitive loss function for recovering the predicted label outputs. Our C2AE is
achieved by integrating the DNN architectures of canonical correlation analysis
and autoencoder, which allows end-to-end learning and prediction with the
ability to exploit label dependency. Moreover, our C2AE can be easily extended
to address the learning problem with missing labels. Our experiments on
multiple datasets with different scales confirm the effectiveness and
robustness of our proposed method, which is shown to perform favorably against
state-of-the-art methods for multi-label classification.Comment: published in AAAI-201
Learning Social Image Embedding with Deep Multimodal Attention Networks
Learning social media data embedding by deep models has attracted extensive
research interest as well as boomed a lot of applications, such as link
prediction, classification, and cross-modal search. However, for social images
which contain both link information and multimodal contents (e.g., text
description, and visual content), simply employing the embedding learnt from
network structure or data content results in sub-optimal social image
representation. In this paper, we propose a novel social image embedding
approach called Deep Multimodal Attention Networks (DMAN), which employs a deep
model to jointly embed multimodal contents and link information. Specifically,
to effectively capture the correlations between multimodal contents, we propose
a multimodal attention network to encode the fine-granularity relation between
image regions and textual words. To leverage the network structure for
embedding learning, a novel Siamese-Triplet neural network is proposed to model
the links among images. With the joint deep model, the learnt embedding can
capture both the multimodal contents and the nonlinear network information.
Extensive experiments are conducted to investigate the effectiveness of our
approach in the applications of multi-label classification and cross-modal
search. Compared to state-of-the-art image embeddings, our proposed DMAN
achieves significant improvement in the tasks of multi-label classification and
cross-modal search
- …