49,035 research outputs found

    Modeling Latent Variable Uncertainty for Loss-based Learning

    Get PDF
    We consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation is modeled using latent variables. Previous methods overburden a single distribution with two separate tasks: (i) modeling the uncertainty in the latent variables during training; and (ii) making accurate predictions for the output and the latent variables during testing. We propose a novel framework that separates the demands of the two tasks using two distributions: (i) a conditional distribution to model the uncertainty of the latent variables for a given input-output pair; and (ii) a delta distribution to predict the output and the latent variables for a given input. During learning, we encourage agreement between the two distributions by minimizing a loss-based dissimilarity coefficient. Our approach generalizes latent SVM in two important ways: (i) it models the uncertainty over latent variables instead of relying on a pointwise estimate; and (ii) it allows the use of loss functions that depend on latent variables, which greatly increases its applicability. We demonstrate the efficacy of our approach on two challenging problems---object detection and action detection---using publicly available datasets.Comment: ICML201

    Uncertainty-aware Salient Object Detection

    Get PDF
    Saliency detection models are trained to discover the region(s) of an image that attract human attention. According to whether depth data is used, static image saliency detection models can be divided into RGB image saliency detection models, and RGB-D image saliency detection models. The former predict salient regions of the RGB image, while the latter take both the RGB image and the depth data as input. Conventional saliency prediction models typically learn a deterministic mapping from images to the corresponding ground truth saliency maps without modeling the uncertainty of predictions, following the supervised learning pipeline. This thesis is dedicated to learning a conditional distribution over saliency maps, given an input image, and modeling the uncertainty of predictions. For RGB-D saliency detection, we present the first generative model based framework to achieve uncertainty-aware prediction. Our framework includes two main models: 1) a generator model and 2) an inference model. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. One drawback of above model is that it fails to explicitly model the connection between RGB image and depth data to achieve effective cooperative learning. We further introduce a novel latent variable model based complementary learning framework to explicitly model the complementary information between the two modes, namely the RGB mode and depth mode. Specifically, we first design a regularizer using mutual-information minimization to reduce the redundancy between appearance features from RGB and geometric features from depth in the latent space. Then we fuse the latent features of each mode to achieve multi-modal feature fusion. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. For RGB saliency detection, we propose a generative saliency prediction model based on the conditional generative cooperative network, where a conditional latent variable model and a conditional energy-based model are jointly trained to predict saliency in a cooperative manner. The latent variable model serves as a coarse saliency model to produce a fast initial prediction, which is then refined by Langevin revision of the energy-based model that serves as a fine saliency model. Apart from the fully supervised learning framework, we also investigate weakly supervised learning, and propose the first scribble-based weakly-supervised salient object detection model. In doing so, we first relabel an existing large-scale salient object detection dataset with scribbles, namely S-DUTS dataset. To mitigate the missing structure information in scribble annotation, we propose an auxiliary edge detection task to localize object edges explicitly, and a gated structure-aware loss to place constraints on the scope of structure to be recovered. To further reduce the labeling burden, we introduce a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples, where the noisy labels are generated by unsupervised handcrafted feature-based methods. The whole model that represents noisy labels is a sum of the two sub-models. The goal of training the model is to estimate the parameters of both sub-models, and simultaneously infer the corresponding latent vector of each noisy label. We propose to train the model by using an alternating back-propagation algorithm. To prevent the network from converging to trivial solutions, we utilize an edge-aware smoothness loss to regularize hidden saliency maps to have similar structures as their corresponding images

    Recurrent Latent Variable Networks for Session-Based Recommendation

    Full text link
    In this work, we attempt to ameliorate the impact of data sparsity in the context of session-based recommendation. Specifically, we seek to devise a machine learning mechanism capable of extracting subtle and complex underlying temporal dynamics in the observed session data, so as to inform the recommendation algorithm. To this end, we improve upon systems that utilize deep learning techniques with recurrently connected units; we do so by adopting concepts from the field of Bayesian statistics, namely variational inference. Our proposed approach consists in treating the network recurrent units as stochastic latent variables with a prior distribution imposed over them. On this basis, we proceed to infer corresponding posteriors; these can be used for prediction and recommendation generation, in a way that accounts for the uncertainty in the available sparse training data. To allow for our approach to easily scale to large real-world datasets, we perform inference under an approximate amortized variational inference (AVI) setup, whereby the learned posteriors are parameterized via (conventional) neural networks. We perform an extensive experimental evaluation of our approach using challenging benchmark datasets, and illustrate its superiority over existing state-of-the-art techniques

    Hierarchical Implicit Models and Likelihood-Free Variational Inference

    Full text link
    Implicit probabilistic models are a flexible class of models defined by a simulation process for data. They form the basis for theories which encompass our understanding of the physical world. Despite this fundamental nature, the use of implicit models remains limited due to challenges in specifying complex latent structure in them, and in performing inferences in such models with large data sets. In this paper, we first introduce hierarchical implicit models (HIMs). HIMs combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure. Next, we develop likelihood-free variational inference (LFVI), a scalable variational inference algorithm for HIMs. Key to LFVI is specifying a variational family that is also implicit. This matches the model's flexibility and allows for accurate approximation of the posterior. We demonstrate diverse applications: a large-scale physical simulator for predator-prey populations in ecology; a Bayesian generative adversarial network for discrete data; and a deep implicit model for text generation.Comment: Appears in Neural Information Processing Systems, 201
    corecore