Multiview Learning with Sparse and Unannotated data.

Abstract

PhD ThesisObtaining annotated training data for supervised learning, is a bottleneck in many contemporary machine learning applications. The increasing prevalence of multi-modal and multi-view data creates both new opportunities for circumventing this issue, and new application challenges. In this thesis we explore several approaches to alleviating annotation issues in multi-view scenarios. We start by studying the problem of zero-shot learning (ZSL) for image recognition, where class-level annotations for image recognition are eliminated by transferring information from text modality instead. We next look at cross-modal matching, where paired instances across views provide the supervised label information for learning. We develop methodology for unsupervised and semi-supervised learning of pairing, thus eliminating the need for annotation requirements. We rst apply these ideas to unsupervised multi-view matching in the context of bilingual dictionary induction (BLI), where instances are words in two languages and nding a correspondence between the words produces a cross-lingual word translation model. We then return to vision and language and look at learning unsupervised pairing between images and text. We will see that this can be seen as a limiting case of ZSL where text-image pairing annotation requirements are completely eliminated. Overall these contributions in multi-view learning provide a suite of methods for reducing annotation requirements: both in conventional classi cation and cross-view matching settings

    Similar works