REPRESENTATION LEARNING WITH ADDITIONAL STRUCTURES

Abstract

The ability to learn meaningful representations of complex, high-dimensional data like image and text for various downstream tasks has been the cornerstone of the modern deep learning success story. Most approaches that succeed in meaningful representation learning of the input data rely on prior knowledge of the underlying data structure to inject appropriate inductive biases into their frameworks. Prime examples of which range from the convolutional neural network (CNN) for images, to the recurrent neural network (RNN) for sequences, and to the recent trend of attention-based models (e.g. transformers) for incorporating relational information. However, most of the traditional approaches focus on a learning setup where there is a single input (and a single output if in a supervised setting). With the rapidly growing varieties of data being collected and the increasing complexity of the structures that underlie them, approaches that are able to take advantage of the additional data structures for better representation learning are needed. To this end, we introduce frameworks to learn better representations of complex data with additional structures in four arenas, where we gradually shift from supervised learning, to ``pseudo-supervised'' learning, and lastly to unsupervised learning. More specifically, we first propose a supervised approach that exploits relational-information among set elements for learning representations of set-structured data. We then propose a clustering approach that utilizes side-information, i.e. information that is related to the final clustering goal but not directly indicative of the clustering results (hence ``pseudo-supervised'' learning), for learning representations that are better for clustering. Next we introduce another clustering approach that leverages the structural assumption that data samples in each cluster form a trajectory. Lastly, we propose a general representation learning framework for learning interpretable representations of multimodal data.Doctor of Philosoph

    Similar works