36 research outputs found

    Learning Generative Models with Visual Attention

    Full text link
    Attention has long been proposed by psychologists as important for effectively dealing with the enormous sensory stimulus available in the neocortex. Inspired by the visual attention models in computational neuroscience and the need of object-centric data for generative models, we describe for generative learning framework using attentional mechanisms. Attentional mechanisms can propagate signals from region of interest in a scene to an aligned canonical representation, where generative modeling takes place. By ignoring background clutter, generative models can concentrate their resources on the object of interest. Our model is a proper graphical model where the 2D Similarity transformation is a part of the top-down process. A ConvNet is employed to provide good initializations during posterior inference which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our model can robustly attend to face regions of novel test subjects. More importantly, our model can learn generative models of new faces from a novel dataset of large images where the face locations are not known.Comment: In the proceedings of Neural Information Processing Systems, 201

    Robust Visual Recognition Using Multilayer Generative Neural Networks

    Get PDF
    Deep generative neural networks such as the Deep Belief Network and Deep Boltzmann Machines have been used successfully to model high dimensional visual data. However, they are not robust to common variations such as occlusion and random noise. In this thesis, we explore two strategies for improving the robustness of DBNs. First, we show that a DBN with sparse connections in the first layer is more robust to variations that are not in the training set. Second, we develop a probabilistic denoising algorithm to determine a subset of the hidden layer nodes to unclamp. We show that this can be applied to any feedforward network classifier with localized first layer connections. By utilizing the already available generative model for denoising prior to recognition, we show significantly better performance over the standard DBN implementations for various sources of noise on the standard and Variations MNIST databases

    Challenges in Representation Learning: A report on three machine learning contests

    Full text link
    The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.Comment: 8 pages, 2 figure

    Learning Generative Models Using Structured Latent Variables

    No full text
    Recent machine learning advances in computer vision and speech recognition have been largely driven by the application of supervised neural networks on large labeled datasets, leveraging effective regularization techniques and architectural design. Using more data and computational resources, performance is likely to continue to improve in the future. Despite these nice properties, supervised neural networks are sometimes criticized because their internal representations are opaque and lack the kind of interpretability that seems evident in human perception. For example, detecting a dog hidden in the bushes by looking at its exposed tail is a task that is not yet solved by discriminative neural networks. Another class of challenging tasks is one-shot learning, where only one training example for a new concept is available to the model for training. It is widely believed that learned prior knowledge must be utilized in order to tackle this problem. My dissertation tries to address some of these concerns by introducing domain-specific knowledge to standard deep learning models. This domain-specific knowledge is used to specify meaningful latent representation with structure, which forces the model to generalize better under certain scenarios. For example, a generative model with latent gating variables that will ``switch off" noisy pixels should perform better when encountering noise at test time. A generative model with latent variables representing 3D surface normal vectors should do better at modeling illumination variations. These are the kinds of relatively simple domain-specific modifications that we explore in this thesis. It is true that we should not rely too much on manual engineering and learn as much as possible. This principle is taken seriously and we strike a balance between using too much laborious engineering on the one hand and learning everything from scratch on the other. Adding structure to deep generative models is not only helpful for computer vision applications, but it is also very effective for unsupervised density learning tasks.Ph.D
    corecore