4,666 research outputs found

    Self-Supervised Feature Learning by Learning to Spot Artifacts

    Full text link
    We introduce a novel self-supervised learning method based on adversarial training. Our objective is to train a discriminator network to distinguish real images from images with synthetic artifacts, and then to extract features from its intermediate layers that can be transferred to other data domains and tasks. To generate images with artifacts, we pre-train a high-capacity autoencoder and then we use a damage and repair strategy: First, we freeze the autoencoder and damage the output of the encoder by randomly dropping its entries. Second, we augment the decoder with a repair network, and train it in an adversarial manner against the discriminator. The repair network helps generate more realistic images by inpainting the dropped feature entries. To make the discriminator focus on the artifacts, we also make it predict what entries in the feature were dropped. We demonstrate experimentally that features learned by creating and spotting artifacts achieve state of the art performance in several benchmarks.Comment: CVPR 2018 (spotlight

    Variance Loss in Variational Autoencoders

    Get PDF
    In this article, we highlight what appears to be major issue of Variational Autoencoders, evinced from an extensive experimentation with different network architectures and datasets: the variance of generated data is significantly lower than that of training data. Since generative models are usually evaluated with metrics such as the Frechet Inception Distance (FID) that compare the distributions of (features of) real versus generated images, the variance loss typically results in degraded scores. This problem is particularly relevant in a two stage setting, where we use a second VAE to sample in the latent space of the first VAE. The minor variance creates a mismatch between the actual distribution of latent variables and those generated by the second VAE, that hinders the beneficial effects of the second stage. Renormalizing the output of the second VAE towards the expected normal spherical distribution, we obtain a sudden burst in the quality of generated samples, as also testified in terms of FID.Comment: Article accepted at the Sixth International Conference on Machine Learning, Optimization, and Data Science. July 19-23, 2020 - Certosa di Pontignano, Siena, Ital

    Background modeling for video sequences by stacked denoising autoencoders

    Get PDF
    Nowadays, the analysis and extraction of relevant information in visual data flows is of paramount importance. These images sequences can last for hours, which implies that the model must adapt to all kinds of circumstances so that the performance of the system does not decay over time. In this paper we propose a methodology for background modeling and foreground detection, whose main characteristic is its robustness against stationary noise. Thus, stacked denoising autoencoders are applied to generate a set of robust characteristics for each region or patch of the image, which will be the input of a probabilistic model to determine if that region is background or foreground. The evaluation of a set of heterogeneous sequences results in that, although our proposal is similar to the classical methods existing in the literature, the inclusion of noise in these sequences causes drastic performance drops in the competing methods, while in our case the performance stays or falls slightly.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
    corecore