126 research outputs found
Learning Multimodal Graph-to-Graph Translation for Molecular Optimization
We view molecular optimization as a graph-to-graph translation problem. The
goal is to learn to map from one molecular graph to another with better
properties based on an available corpus of paired molecules. Since molecules
can be optimized in different ways, there are multiple viable translations for
each input graph. A key challenge is therefore to model diverse translation
outputs. Our primary contributions include a junction tree encoder-decoder for
learning diverse graph translations along with a novel adversarial training
method for aligning distributions of molecules. Diverse output distributions in
our model are explicitly realized by low-dimensional latent vectors that
modulate the translation process. We evaluate our model on multiple molecular
optimization tasks and show that our model outperforms previous
state-of-the-art baselines
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
Many approaches in generalized zero-shot learning rely on cross-modal mapping
between the image feature space and the class embedding space. As labeled
images are expensive, one direction is to augment the dataset by generating
either images or image features. However, the former misses fine-grained
details and the latter requires learning a mapping associated with class
embeddings. In this work, we take feature generation one step further and
propose a model where a shared latent space of image features and class
embeddings is learned by modality-specific aligned variational autoencoders.
This leaves us with the required discriminative information about the image and
classes in the latent features, on which we train a softmax classifier. The key
to our approach is that we align the distributions learned from images and from
side-information to construct latent features that contain the essential
multi-modal information associated with unseen classes. We evaluate our learned
latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2,
and establish a new state of the art on generalized zero-shot as well as on
few-shot learning. Moreover, our results on ImageNet with various zero-shot
splits show that our latent features generalize well in large-scale settings.Comment: Accepted at CVPR 201
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives
Over the past few years, adversarial training has become an extremely active
research topic and has been successfully applied to various Artificial
Intelligence (AI) domains. As a potentially crucial technique for the
development of the next generation of emotional AI systems, we herein provide a
comprehensive overview of the application of adversarial training to affective
computing and sentiment analysis. Various representative adversarial training
algorithms are explained and discussed accordingly, aimed at tackling diverse
challenges associated with emotional AI systems. Further, we highlight a range
of potential future research directions. We expect that this overview will help
facilitate the development of adversarial training for affective computing and
sentiment analysis in both the academic and industrial communities
Multiview Learning with Sparse and Unannotated data.
PhD ThesisObtaining annotated training data for supervised learning, is a bottleneck in many
contemporary machine learning applications. The increasing prevalence of multi-modal
and multi-view data creates both new opportunities for circumventing this issue, and
new application challenges. In this thesis we explore several approaches to alleviating
annotation issues in multi-view scenarios.
We start by studying the problem of zero-shot learning (ZSL) for image recognition,
where class-level annotations for image recognition are eliminated by transferring information
from text modality instead. We next look at cross-modal matching, where
paired instances across views provide the supervised label information for learning. We
develop methodology for unsupervised and semi-supervised learning of pairing, thus
eliminating the need for annotation requirements.
We rst apply these ideas to unsupervised multi-view matching in the context of
bilingual dictionary induction (BLI), where instances are words in two languages and
nding a correspondence between the words produces a cross-lingual word translation
model. We then return to vision and language and look at learning unsupervised pairing
between images and text. We will see that this can be seen as a limiting case of ZSL
where text-image pairing annotation requirements are completely eliminated.
Overall these contributions in multi-view learning provide a suite of methods for
reducing annotation requirements: both in conventional classi cation and cross-view
matching settings
A Chronological Survey of Theoretical Advancements in Generative Adversarial Networks for Computer Vision
Generative Adversarial Networks (GANs) have been workhorse generative models
for last many years, especially in the research field of computer vision.
Accordingly, there have been many significant advancements in the theory and
application of GAN models, which are notoriously hard to train, but produce
good results if trained well. There have been many a surveys on GANs,
organizing the vast GAN literature from various focus and perspectives.
However, none of the surveys brings out the important chronological aspect: how
the multiple challenges of employing GAN models were solved one-by-one over
time, across multiple landmark research works. This survey intends to bridge
that gap and present some of the landmark research works on the theory and
application of GANs, in chronological order
- …