1,942 research outputs found

    Partial membership latent Dirichlet allocation

    Get PDF
    Dissertation supervisor: Dr. Alina Zare.Includes vita.For many years, topic models (e.g., pLSA, LDA, SLDA) have been widely used for segmenting and recognizing objects in imagery simultaneously. However, these models are confined to the analysis of categorical data, forcing a visual word to belong to one and only one topic. There are many images in which some regions cannot be assigned a crisp categorical label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, a partial membership latent Dirichlet allocation (PM-LDA) model and associated parameter estimation algorithm are present. PM-LDA defines a novel partial membership model for word and document generation. Different from the standard LDA model which assumes that each word belongs to one and only one topic, PM-LDA model allows words to have partial membership in multiple topics. This model can be useful for image[slash]video documents where a visual word (an image patch) may be a mixture of multiple topics. For example, in a SONAR imagery where the gradually vanishing sand ripples blur the boundary between sand ripple region and flat sand region, it is impossible to tell where the sand ripple ends and the flat sand starts. In the proposed PM-LDA model, the visual words are represented with partial memberships in both "sand ripple" and "flat sand" topics, which is more reasonable than assigning them to one and only one topic as in the standard LDA model. A Gibbs sampling is employed for parameter estimation. Experimental results on simulated data, SONAR image dataset and natural image datasets show that PM-LDA can produce both crisp and soft semantic image segmentations; a capability existing methods do not have.Includes bibliographical references (pages 147-157)

    Hyperspectral Unmixing with Endmember Variability using Partial Membership Latent Dirichlet Allocation

    Full text link
    The application of Partial Membership Latent Dirichlet Allocation(PM-LDA) for hyperspectral endmember estimation and spectral unmixing is presented. PM-LDA provides a model for a hyperspectral image analysis that accounts for spectral variability and incorporates spatial information through the use of superpixel-based 'documents.' In our application of PM-LDA, we employ the Normal Compositional Model in which endmembers are represented as Normal distributions to account for spectral variability and proportion vectors are modeled as random variables governed by a Dirichlet distribution. The use of the Dirichlet distribution enforces positivity and sum-to-one constraints on the proportion values. Algorithm results on real hyperspectral data indicate that PM-LDA produces endmember distributions that represent the ground truth classes and their associated variability

    Map-guided hyperspectral image superpixel segmentation using semi-supervised partial membership latent Dirichlet allocation

    Get PDF
    Many superpixel segmentation algorithms which are suitable for the regular color images like images with three channels: red, green and blue (RGB images) have been developed in the literature. However, because of the high dimensionality of hyperspectral imagery, these regular superpixel segmentation algorithms often do not perform well in hyperspectral imagery. Although there are some authors who have modified some regular superpixel segmentation algorithms to fit the hyperspectral image, many still underperform on complex data. In this thesis, to solve this problem, we introduce a hyperspectral unmixing based superpixel segmentation that leverages map information. We call this approach map-guided semi-supervised PM-LDA superpixel segmentation. The approach uses auxilliary map information to guide segmentation. The approach also leverages spectral unmixing results to provide improved results compared with segmentation based on raw data. We test our proposed method on two real hyperspectral data, University of Pavia and MUUFL Gulfport Hyperspectral Data. In these experiments, our proposed method achieves better results compared to other state-of-the-art algorithms. We also develop new cluster validity metrics to evaluate the results

    The latent process decomposition of cDNA microarray data sets

    Get PDF
    We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast
    • …
    corecore