Search CORE

366,826 research outputs found

Common Representation Learning Using Step-based Correlation Multi-Modal CNN

Author: Bhatt Gaurav
Jha Piyush
Raman Balasubramanian
Publication venue
Publication date: 31/10/2017
Field of study

Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of canonical correlation-based approaches and autoencoder based approaches. In this paper, we investigate the performance of deep autoencoder based methods on multi-view data. We propose a novel step-based correlation multi-modal CNN (CorrMCNN) which reconstructs one view of the data given the other while increasing the interaction between the representations at each hidden layer or every intermediate step. Finally, we evaluate the performance of the proposed model on two benchmark datasets - MNIST and XRMB. Through extensive experiments, we find that the proposed model achieves better performance than the current state-of-the-art techniques on joint common representation learning and transfer learning tasks.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017

arXiv.org e-Print Archive

Crossref

Context-Aware Deep Sequence Learning with Multi-View Factor Pooling for Time Series Classification

Author: Cho Isaac
Das Bhattacharjee Sreyasee
Djorgovski George
Elshambakey Mohammed
Mahabal Ashish
Tolone William J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2018
Field of study

In this paper, we propose an effective, multi-view, multivariate deep classification model for time-series data. Multi-view methods show promise in their ability to learn correlation and exclusivity properties across different independent information resources. However, most current multi-view integration schemes employ only a linear model and, therefore, do not extensively utilize the relationships observed across different view-specific representations. Moreover, the majority of these methods rely exclusively on sophisticated, handcrafted features to capture local data patterns and, thus, depend heavily on large collections of labeled data. The multi-view, multivariate deep classification model for time-series data proposed in this paper makes important contributions to address these limitations. The proposed model derives a LSTM-based, deep feature descriptor to model both the view-specific data characteristics and cross-view interaction in an integrated deep architecture while driving the learning phase in a data-driven manner. The proposed model employs a compact context descriptor to exploit view-specific affinity information to design a more insightful context representation. Finally, the model uses a multi-view factor-pooling scheme for a context-driven attention learning strategy to weigh the most relevant feature dimensions while eliminating noise from the resulting fused descriptor. As shown by experiments, compared to the existing multi-view methods, the proposed multi-view deep sequential learning approach improves classification performance by roughly 4% in the UCI multi-view activity recognition dataset, while also showing significantly robust generalized representation capacity against its single-view counterparts, in classifying several large-scale multi-view light curve collections

Crossref

Caltech Authors

Improving Representation Learning for Deep Clustering and Few-shot Learning

Author: Trosten Daniel Johansen
Publication venue: UiT The Arctic University of Norway
Publication date: 23/08/2023
Field of study

The amounts of data in the world have increased dramatically in recent years, and it is quickly becoming infeasible for humans to label all these data. It is therefore crucial that modern machine learning systems can operate with few or no labels. The introduction of deep learning and deep neural networks has led to impressive advancements in several areas of machine learning. These advancements are largely due to the unprecedented ability of deep neural networks to learn powerful representations from a wide range of complex input signals. This ability is especially important when labeled data is limited, as the absence of a strong supervisory signal forces models to rely more on intrinsic properties of the data and its representations. This thesis focuses on two key concepts in deep learning with few or no labels. First, we aim to improve representation quality in deep clustering - both for single-view and multi-view data. Current models for deep clustering face challenges related to properly representing semantic similarities, which is crucial for the models to discover meaningful clusterings. This is especially challenging with multi-view data, since the information required for successful clustering might be scattered across many views. Second, we focus on few-shot learning, and how geometrical properties of representations influence few-shot classification performance. We find that a large number of recent methods for few-shot learning embed representations on the hypersphere. Hence, we seek to understand what makes the hypersphere a particularly suitable embedding space for few-shot learning. Our work on single-view deep clustering addresses the susceptibility of deep clustering models to find trivial solutions with non-meaningful representations. To address this issue, we present a new auxiliary objective that - when compared to the popular autoencoder-based approach - better aligns with the main clustering objective, resulting in improved clustering performance. Similarly, our work on multi-view clustering focuses on how representations can be learned from multi-view data, in order to make the representations suitable for the clustering objective. Where recent methods for deep multi-view clustering have focused on aligning view-specific representations, we find that this alignment procedure might actually be detrimental to representation quality. We investigate the effects of representation alignment, and provide novel insights on when alignment is beneficial, and when it is not. Based on our findings, we present several new methods for deep multi-view clustering - both alignment and non-alignment-based - that out-perform current state-of-the-art methods. Our first work on few-shot learning aims to tackle the hubness problem, which has been shown to have negative effects on few-shot classification performance. To this end, we present two new methods to embed representations on the hypersphere for few-shot learning. Further, we provide both theoretical and experimental evidence indicating that embedding representations as uniformly as possible on the hypersphere reduces hubness, and improves classification accuracy. Furthermore, based on our findings on hyperspherical embeddings for few-shot learning, we seek to improve the understanding of representation norms. In particular, we ask what type of information the norm carries, and why it is often beneficial to discard the norm in classification models. We answer this question by presenting a novel hypothesis on the relationship between representation norm and the number of a certain class of objects in the image. We then analyze our hypothesis both theoretically and experimentally, presenting promising results that corroborate the hypothesis

Munin - Open Research Archive

Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning

Author: Chen Xiaoyu
Du Changde
He Huiguang
Zhou Qiongyi
Publication venue
Publication date: 08/08/2023
Field of study

The human brain can easily focus on one speaker and suppress others in scenarios such as a cocktail party. Recently, researchers found that auditory attention can be decoded from the electroencephalogram (EEG) data. However, most existing deep learning methods are difficult to use prior knowledge of different views (that is attended speech and EEG are task-related views) and extract an unsatisfactory representation. Inspired by Broadbent's filter model, we decode auditory attention in a multi-view paradigm and extract the most relevant and important information utilizing the missing view. Specifically, we propose an auditory attention decoding (AAD) method based on multi-view VAE with task-related multi-view contrastive (TMC) learning. Employing TMC learning in multi-view VAE can utilize the missing view to accumulate prior knowledge of different views into the fusion of representation, and extract the approximate task-related representation. We examine our method on two popular AAD datasets, and demonstrate the superiority of our method by comparing it to the state-of-the-art method

arXiv.org e-Print Archive