3 research outputs found
SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder
We propose a semi-supervised generative model, SeGMA, which learns a joint
probability distribution of data and their classes and which is implemented in
a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians
as a target distribution in latent space, which provides a natural splitting of
data into clusters. To connect Gaussian components with correct classes, we use
a small amount of labeled data and a Gaussian classifier induced by the target
distribution. SeGMA is optimized efficiently due to the use of Cramer-Wold
distance as a maximum mean discrepancy penalty, which yields a closed-form
expression for a mixture of spherical Gaussian components and thus obviates the
need of sampling. While SeGMA preserves all properties of its semi-supervised
predecessors and achieves at least as good generative performance on standard
benchmark data sets, it presents additional features: (a) interpolation between
any pair of points in the latent space produces realistically-looking samples;
(b) combining the interpolation property with disentangled class and style
variables, SeGMA is able to perform a continuous style transfer from one class
to another; (c) it is possible to change the intensity of class characteristics
in a data point by moving the latent representation of the data point away from
specific Gaussian components
Unsupervised Meta-learning
Deep learning has achieved classification performance matching or exceeding the human one, as long as plentiful labeled training samples are available. However, the performance on few-shot learning, where the classifier had seen only several or possibly only one sample of the class is still significantly below human performance. Recently, a type of algorithm called meta-learning achieved impressive performance for few-shot learning. However, meta-learning requires a large dataset of labeled tasks closely related to the test task. The work described in this dissertation outlines techniques that significantly reduce the need for expensive and scarce labeled data in the meta-learning phase. Our insight is that meta-training datasets require only in-class samples (samples belonging to the same class) and out-of-class samples. The actual labels associated with the classes are not relevant, as they are not retained in the meta-learning process. First, we propose an algorithm called UMTRA that generates out-of-class samples using random sampling from an unlabeled dataset, and generates in-class samples using augmentation. We show that UMTRA achieves a large fraction of the accuracy of supervised meta-learning, while using orders of magnitudes less labeled data. Second, we note that the augmentation step in UMTRA works best when an augmentation technology specific to the domain is used. In many practical cases it is easier to train a generative model for a domain than to find an augmentation algorithm. From this idea, we design a new unsupervised meta-learning algorithm called LASIUM, where the in- and out-of-class samples for the meta-learning step are generated by choosing appropriate points in the latent space of a generative model (such as a variational autoencoder or generative adversarial network). Finally, we describe work that makes progress towards a next step in meta-learning, the ability to draw the meta-training samples from a different domain from the target task\u27s domain