126 research outputs found

    Learning Multimodal Graph-to-Graph Translation for Molecular Optimization

    Full text link
    We view molecular optimization as a graph-to-graph translation problem. The goal is to learn to map from one molecular graph to another with better properties based on an available corpus of paired molecules. Since molecules can be optimized in different ways, there are multiple viable translations for each input graph. A key challenge is therefore to model diverse translation outputs. Our primary contributions include a junction tree encoder-decoder for learning diverse graph translations along with a novel adversarial training method for aligning distributions of molecules. Diverse output distributions in our model are explicitly realized by low-dimensional latent vectors that modulate the translation process. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines

    Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

    Full text link
    Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders. This leaves us with the required discriminative information about the image and classes in the latent features, on which we train a softmax classifier. The key to our approach is that we align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes. We evaluate our learned latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2, and establish a new state of the art on generalized zero-shot as well as on few-shot learning. Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings.Comment: Accepted at CVPR 201

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

    Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

    Get PDF
    Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities

    Multiview Learning with Sparse and Unannotated data.

    Get PDF
    PhD ThesisObtaining annotated training data for supervised learning, is a bottleneck in many contemporary machine learning applications. The increasing prevalence of multi-modal and multi-view data creates both new opportunities for circumventing this issue, and new application challenges. In this thesis we explore several approaches to alleviating annotation issues in multi-view scenarios. We start by studying the problem of zero-shot learning (ZSL) for image recognition, where class-level annotations for image recognition are eliminated by transferring information from text modality instead. We next look at cross-modal matching, where paired instances across views provide the supervised label information for learning. We develop methodology for unsupervised and semi-supervised learning of pairing, thus eliminating the need for annotation requirements. We rst apply these ideas to unsupervised multi-view matching in the context of bilingual dictionary induction (BLI), where instances are words in two languages and nding a correspondence between the words produces a cross-lingual word translation model. We then return to vision and language and look at learning unsupervised pairing between images and text. We will see that this can be seen as a limiting case of ZSL where text-image pairing annotation requirements are completely eliminated. Overall these contributions in multi-view learning provide a suite of methods for reducing annotation requirements: both in conventional classi cation and cross-view matching settings

    A Chronological Survey of Theoretical Advancements in Generative Adversarial Networks for Computer Vision

    Full text link
    Generative Adversarial Networks (GANs) have been workhorse generative models for last many years, especially in the research field of computer vision. Accordingly, there have been many significant advancements in the theory and application of GAN models, which are notoriously hard to train, but produce good results if trained well. There have been many a surveys on GANs, organizing the vast GAN literature from various focus and perspectives. However, none of the surveys brings out the important chronological aspect: how the multiple challenges of employing GAN models were solved one-by-one over time, across multiple landmark research works. This survey intends to bridge that gap and present some of the landmark research works on the theory and application of GANs, in chronological order

    Deep Visual Unsupervised Domain Adaptation for Classification Tasks:A Survey

    Get PDF
    corecore