3 research outputs found

    SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder

    Full text link
    We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and which is implemented in a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small amount of labeled data and a Gaussian classifier induced by the target distribution. SeGMA is optimized efficiently due to the use of Cramer-Wold distance as a maximum mean discrepancy penalty, which yields a closed-form expression for a mixture of spherical Gaussian components and thus obviates the need of sampling. While SeGMA preserves all properties of its semi-supervised predecessors and achieves at least as good generative performance on standard benchmark data sets, it presents additional features: (a) interpolation between any pair of points in the latent space produces realistically-looking samples; (b) combining the interpolation property with disentangled class and style variables, SeGMA is able to perform a continuous style transfer from one class to another; (c) it is possible to change the intensity of class characteristics in a data point by moving the latent representation of the data point away from specific Gaussian components

    Unsupervised Meta-learning

    Get PDF
    Deep learning has achieved classification performance matching or exceeding the human one, as long as plentiful labeled training samples are available. However, the performance on few-shot learning, where the classifier had seen only several or possibly only one sample of the class is still significantly below human performance. Recently, a type of algorithm called meta-learning achieved impressive performance for few-shot learning. However, meta-learning requires a large dataset of labeled tasks closely related to the test task. The work described in this dissertation outlines techniques that significantly reduce the need for expensive and scarce labeled data in the meta-learning phase. Our insight is that meta-training datasets require only in-class samples (samples belonging to the same class) and out-of-class samples. The actual labels associated with the classes are not relevant, as they are not retained in the meta-learning process. First, we propose an algorithm called UMTRA that generates out-of-class samples using random sampling from an unlabeled dataset, and generates in-class samples using augmentation. We show that UMTRA achieves a large fraction of the accuracy of supervised meta-learning, while using orders of magnitudes less labeled data. Second, we note that the augmentation step in UMTRA works best when an augmentation technology specific to the domain is used. In many practical cases it is easier to train a generative model for a domain than to find an augmentation algorithm. From this idea, we design a new unsupervised meta-learning algorithm called LASIUM, where the in- and out-of-class samples for the meta-learning step are generated by choosing appropriate points in the latent space of a generative model (such as a variational autoencoder or generative adversarial network). Finally, we describe work that makes progress towards a next step in meta-learning, the ability to draw the meta-training samples from a different domain from the target task\u27s domain
    corecore