408 research outputs found

    Common Representation Learning Using Step-based Correlation Multi-Modal CNN

    Full text link
    Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of canonical correlation-based approaches and autoencoder based approaches. In this paper, we investigate the performance of deep autoencoder based methods on multi-view data. We propose a novel step-based correlation multi-modal CNN (CorrMCNN) which reconstructs one view of the data given the other while increasing the interaction between the representations at each hidden layer or every intermediate step. Finally, we evaluate the performance of the proposed model on two benchmark datasets - MNIST and XRMB. Through extensive experiments, we find that the proposed model achieves better performance than the current state-of-the-art techniques on joint common representation learning and transfer learning tasks.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017

    Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

    Full text link
    Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains. However, stochastic optimization of the deep CCA objective is not straightforward, because it does not decouple over training examples. Previous optimizers for deep CCA are either batch-based algorithms or stochastic optimization using large minibatches, which can have high memory consumption. In this paper, we tackle the problem of stochastic optimization for deep CCA with small minibatches, based on an iterative solution to the CCA objective, and show that we can achieve as good performance as previous optimizers and thus alleviate the memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and Computin

    Decoding Kinematic Information From Primary Motor Cortex Ensemble Activities Using a Deep Canonical Correlation Analysis

    Get PDF
    The control of arm movements through intracortical brain-machine interfaces (BMIs) mainly relies on the activities of the primary motor cortex (M1) neurons and mathematical models that decode their activities. Recent research on decoding process attempts to not only improve the performance but also simultaneously understand neural and behavioral relationships. In this study, we propose an efficient decoding algorithm using a deep canonical correlation analysis (DCCA), which maximizes correlations between canonical variables with the non-linear approximation of mappings from neuronal to canonical variables via deep learning. We investigate the effectiveness of using DCCA for finding a relationship between M1 activities and kinematic information when non-human primates performed a reaching task with one arm. Then, we examine whether using neural activity representations from DCCA improves the decoding performance through linear and non-linear decoders: a linear Kalman filter (LKF) and a long short-term memory in recurrent neural networks (LSTM-RNN). We found that neural representations of M1 activities estimated by DCCA resulted in more accurate decoding of velocity than those estimated by linear canonical correlation analysis, principal component analysis, factor analysis, and linear dynamical system. Decoding with DCCA yielded better performance than decoding the original FRs using LSTM-RNN (6.6 and 16.0% improvement on average for each velocity and position, respectively; Wilcoxon rank sum test, p < 0.05). Thus, DCCA can identify the kinematics-related canonical variables of M1 activities, thus improving the decoding performance. Our results may help advance the design of decoding models for intracortical BMIs

    Learning Deep Latent Spaces for Multi-Label Classification

    Full text link
    Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification.Comment: published in AAAI-201

    Learning Social Image Embedding with Deep Multimodal Attention Networks

    Full text link
    Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link information and multimodal contents (e.g., text description, and visual content), simply employing the embedding learnt from network structure or data content results in sub-optimal social image representation. In this paper, we propose a novel social image embedding approach called Deep Multimodal Attention Networks (DMAN), which employs a deep model to jointly embed multimodal contents and link information. Specifically, to effectively capture the correlations between multimodal contents, we propose a multimodal attention network to encode the fine-granularity relation between image regions and textual words. To leverage the network structure for embedding learning, a novel Siamese-Triplet neural network is proposed to model the links among images. With the joint deep model, the learnt embedding can capture both the multimodal contents and the nonlinear network information. Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search. Compared to state-of-the-art image embeddings, our proposed DMAN achieves significant improvement in the tasks of multi-label classification and cross-modal search
    corecore