31 research outputs found

    LAVAE: Disentangling Location and Appearance

    Full text link
    We propose a probabilistic generative model for unsupervised learning of structured, interpretable, object-based representations of visual scenes. We use amortized variational inference to train the generative model end-to-end. The learned representations of object location and appearance are fully disentangled, and objects are represented independently of each other in the latent space. Unlike previous approaches that disentangle location and appearance, ours generalizes seamlessly to scenes with many more objects than encountered in the training regime. We evaluate the proposed model on multi-MNIST and multi-dSprites data sets

    Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds

    Full text link
    This paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE). We prove that in the limit of large KK (number of importance samples) one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as K\sqrt{K}. This is in contrast to the standard pathwise gradient estimator where the SNR decreases as 1/K1/\sqrt{K}. Based on our theoretical findings we develop a novel control variate that extends on VIMCO. Empirically, for the training of both continuous and discrete generative models, the proposed method yields superior variance reduction, resulting in an SNR for IWAE that increases with KK without relying on the reparameterization trick. The novel estimator is competitive with state-of-the-art reparameterization-free gradient estimators such as Reweighted Wake-Sleep (RWS) and the thermodynamic variational objective (TVO) when training generative models

    Semi-Supervised Variational Autoencoder for Survival Prediction

    Full text link
    In this paper we propose a semi-supervised variational autoencoder for classification of overall survival groups from tumor segmentation masks. The model can use the output of any tumor segmentation algorithm, removing all assumptions on the scanning platform and the specific type of pulse sequences used, thereby increasing its generalization properties. Due to its semi-supervised nature, the method can learn to classify survival time by using a relatively small number of labeled subjects. We validate our model on the publicly available dataset from the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2019.Comment: Published in the pre-conference proceeding of "2019 International MICCAI BraTS Challenge

    DiffEnc: Variational Diffusion with a Learned Encoder

    Full text link
    Diffusion models may be viewed as hierarchical variational autoencoders (VAEs) with two improvements: parameter sharing for the conditional distributions in the generative process and efficient computation of the loss as independent terms over the hierarchy. We consider two changes to the diffusion model that retain these advantages while adding flexibility to the model. Firstly, we introduce a data- and depth-dependent mean function in the diffusion process, which leads to a modified diffusion loss. Our proposed framework, DiffEnc, achieves state-of-the-art likelihood on CIFAR-10. Secondly, we let the ratio of the noise variance of the reverse encoder process and the generative process be a free weight parameter rather than being fixed to 1. This leads to theoretical insights: For a finite depth hierarchy, the evidence lower bound (ELBO) can be used as an objective for a weighted diffusion loss approach and for optimizing the noise schedule specifically for inference. For the infinite-depth hierarchy, on the other hand, the weight parameter has to be 1 to have a well-defined ELBO

    Assessing Neural Network Robustness via Adversarial Pivotal Tuning

    Full text link
    The robustness of image classifiers is essential to their deployment in the real world. The ability to assess this resilience to manipulations or deviations from the training data is thus crucial. These modifications have traditionally consisted of minimal changes that still manage to fool classifiers, and modern approaches are increasingly robust to them. Semantic manipulations that modify elements of an image in meaningful ways have thus gained traction for this purpose. However, they have primarily been limited to style, color, or attribute changes. While expressive, these manipulations do not make use of the full capabilities of a pretrained generative model. In this work, we aim to bridge this gap. We show how a pretrained image generator can be used to semantically manipulate images in a detailed, diverse, and photorealistic way while still preserving the class of the original image. Inspired by recent GAN-based image inversion methods, we propose a method called Adversarial Pivotal Tuning (APT). Given an image, APT first finds a pivot latent space input that reconstructs the image using a pretrained generator. It then adjusts the generator's weights to create small yet semantic manipulations in order to fool a pretrained classifier. APT preserves the full expressive editing capabilities of the generative model. We demonstrate that APT is capable of a wide range of class-preserving semantic image manipulations that fool a variety of pretrained classifiers. Finally, we show that classifiers that are robust to other benchmarks are not robust to APT manipulations and suggest a method to improve them. Code available at: https://captaine.github.io/apt/Comment: Major changes include new experiments in Table 1 on page 5 and Table 2-4 on page 6, new figure 5 on page 8. Paper accepted at WACV (oral

    Generalization and Robustness Implications in Object-Centric Learning

    Full text link
    The idea behind object-centric representation learning is that natural scenes can better be modeled as compositions of objects and their relations as opposed to distributed representations. This inductive bias can be injected into neural networks to potentially improve systematic generalization and learning efficiency of downstream tasks in scenes with multiple objects. In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation accuracy and downstream object property prediction. In addition, we study systematic generalization and robustness by investigating the settings where either single objects are out-of-distribution -- e.g., having unseen colors, textures, and shapes -- or global properties of the scene are altered -- e.g., by occlusions, cropping, or increasing the number of objects. From our experimental study, we find object-centric representations to be generally useful for downstream tasks and robust to shifts in the data distribution, especially if shifts affect single objects

    On the Transfer of Disentangled Representations in Realistic Settings

    Full text link
    Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable. We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution. We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance.Comment: Published at ICLR 202

    Assaying Out-Of-Distribution Generalization in Transfer Learning

    Full text link
    Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies

    Use of high-sensitivity cardiac troponins in the emergency department for the early rule-in and rule-out of acute myocardial infarction without persistent ST-segment elevation (NSTEMI) in Italy

    Get PDF
    : Serial measurements of cardiac troponin are recommended by international guidelines to diagnose myocardial infarction (MI) since 2000. However, some relevant differences exist between the three different international guidelines published between 2020 and 2021 for the management of patients with chest pain and no ST-segment elevation. In particular, there is no agreement on the cut-offs or absolute change values to diagnose non-ST-segment elevation MI (NSTEMI). Other controversial issues concern the diagnostic accuracy and cost-effectiveness of cut-off values for the most rapid algorithms (0 h/1 h or 0 h/2 h) to rule-in and rule-out NSTEMI. Finally, another important point is the possible differences between demographic and clinical characteristics of patients enrolled in multicenter trials compared to those routinely admitted to the Emergency Department in Italy. The Study Group of Cardiac Biomarkers, supported by the Italian Scientific Societies SocietaĚ€ Italiana di Biochimica Clinica, Italian Society of the European Ligand Assay Society, and SocietaĚ€ Italiana di Patolgia Clinica e Medicina di Laboratorio decided to revise the document previously published in 2013 about the management of patients with suspected NSTEMI, and to provide some suggestions for the use of these biomarkers in clinical practice, with a particular focus on the Italian setting
    corecore