43 research outputs found

    Manifold Relevance Determination

    Full text link
    In this paper we present a fully Bayesian latent variable model which exploits conditional nonlinear(in)-dependence structures to learn an efficient latent representation. The latent space is factorized to represent shared and private information from multiple views of the data. In contrast to previous approaches, we introduce a relaxation to the discrete segmentation and allow for a "softly" shared latent space. Further, Bayesian techniques allow us to automatically estimate the dimensionality of the latent spaces. The model is capable of capturing structure underlying extremely high dimensional spaces. This is illustrated by modelling unprocessed images with tenths of thousands of pixels. This also allows us to directly generate novel images from the trained model by sampling from the discovered latent spaces. We also demonstrate the model by prediction of human pose in an ambiguous setting. Our Bayesian framework allows us to perform disambiguation in a principled manner by including latent space priors which incorporate the dynamic nature of the data.Comment: ICML201

    Unsupervised Learning of Multiple Objects in Images

    Get PDF
    Institute for Adaptive and Neural ComputationDeveloping computer vision algorithms able to learn from unsegmented images containing multiple objects is important since this is how humans constantly learn from visual experiences. In this thesis we consider images containing views of multiple objects and our task is to learn about each of the objects present in the images. This task can be approached as a factorial learning problem, where each image is explained by instantiating a model for each of the objects present with the correct instantiation parameters. A major problem with learning a factorial model is that as the number of objects increases, there is a combinatorial explosion of the number of configurations that need to be considered. We develop a greedy algorithm to extract object models sequentially from the data by making use of a robust statistical method, thus avoiding the combinatorial explosion. When we have video data, we greatly speed up the greedy algorithm by carrying out approximate tracking of the multiple objects in the scene. This method is applied to raw image sequence data and extracts the objects one at a time. First, the (possibly moving) background is learned, and moving objects are found at later stages. The algorithm recursively updates an appearance model so that occlusion is taken into account, and matches this model to the frames through the sequence. We apply this method to learn multiple objects in image sequences as well as articulated parts of the human body. Additionally, we learn a distribution over parts undergoing full affine transformations that expresses the relative movements of the parts. The idea of fitting a model to data sequentially using robust statistics is quite general and it can be applied to other models. We describe a method for training mixture models by learning one component at a time and thus building the mixture model in a sequential manner. We do this by incorporating an outlier component into the mixture model which allows us to fit just one data cluster by "ignoring" the rest of the clusters. Once a model is fitted we remove from consideration all the data explained by this model and then repeat the operation. This algorithm can be used to provide a sensible initialization of the mixture components when we train a mixture model
    corecore