28 research outputs found

    On scalable inference and learning in spike-and-slab sparse coding

    Get PDF
    Sparse coding is a widely applied latent variable analysis technique. The standard formulation of sparse coding assumes Laplace as a prior distribution for modeling the activations of latent components. In this work we study sparse coding with spike-and-slab distribution as a prior for latent activity. A spike-and-slab distribution has its probability mass distributed across a ’spike’ at zero and a ’slab’ spreading over a continuous range. For its capacity to induce exact zeros with a higher likelihood, a spike-and-slab prior distribution constitutes a more accurate model of sparse coding. The distribution as a prior also allows for the sparseness of latent activity to be directly inferred from observed data, which essentially makes spike-and-slab sparse coding more flexible and self-adaptive to a wide range of data distributions. By modeling the slab with a Gaussian distribution, we furthermore show that in contrast to the standard approach to sparse coding, we can indeed derive closed-form analytical expressions for exact inference and learning in linear spike-and-slab sparse coding. However, as the posterior landscape of a spike-and-slab prior turns out to be highly multi-modal with a prohibitive exploration cost, in addition to the exact method, we also develop subspace and Gibbs sampling based approximate inference techniques for scalable applications of the linear model. We contrast our approximation methods with variational approximation for scalable posterior inference in linear spike-and-slab sparse coding. We further combine the Gaussian spike-and-slab prior with a nonlinear generative model, which assumes a point-wise maximum combination rule for the generation of observed data. We analyze the model as a precise encoder of low-level features such as edges and their occlusions in visual data. We again combine subspace selection with Gibbs sampling to overcome the analytical intractability of performing exact inference in the model. We numerically analyze our methods on both synthetic and real data for their verification and comparison with other approaches. We assess the linear spike-and-slab approach on source separation and image denoising benchmarks. In most experiments we obtain competitive or state-of-the-art results, while we find that spike-and-slab sparse coding overall outperforms other comparable approaches. By extracting thousands of latent components from a large amount of training data we further demonstrate that our subspace Gibbs sampler is among the most scalable posterior inference methods for a linear sparse coding approach. For the nonlinear model we experiment with artificial and real images to demonstrate that the components learned by the model lie closer to the ground-truth and are easily interpretable as the underlying generative causes of the input. We find that in comparison to standard sparse coding, the nonlinear spike-and-slab approach can compressively encode images using naturally sparse and discernible compositions of latent components. We also demonstrate that the components inferred by the model from natural image patches are statistically more consistent with respect to their structure and distribution to the response patterns of simple cells in the primary visual cortex of the brain. This work thereby contributes novel methods for sophisticated inference and learning in spike-and-slab sparse coding, while it also empirically showcases their functional efficacy through a variety of applications.Sparse Coding ist eine weit verbreitete Technik der latenten Variablenanalyse. Die Standardformulierung von Sparse Coding setzt a priori eine Laplace-Verteilung zur Modellierung der Aktivierung von latenten Komponenten voraus. In dieser Arbeit untersuchen wir Sparse Coding mit einer a priori Spike-and-Slab-Verteilung für latente Aktivität. Eine Spike-and-Slab-Verteilung verteilt ihre Wahrscheinlichkeitsmasse um ein Aktionspotential (“Spike”) um Null und eine dicke Verteilung (“slab”) über einen kontinuierlichen Wertebereich. Durch die Induktion von exakten Nullen mit einer höheren Wahrscheinlichkeit erzeugt eine Apriori-Spike-and-Slab-Verteilung ein genaueres Modell von Sparse Coding. Als A-priori-Verteilung erlaubt sie es uns die Seltenheit von latenten Komponenten direkt von Daten abzuleiten, sodass ein Spike-and-Slab-getriebenes Modell von Sparse Coding sich besser verschiedensten Verteilungen von Daten anpasst. Durch das Modellieren des Slab mittels einer Gauß-Verteilung zeigen wir, dass – im Gegensatz zur Standardformulierung von Sparse Coding – wir in der Tat geschlossene analytische Ausdrücke ableiten können, um eine exakte Ableitung und das Lernen eines linearen Spike-and-Slab-Sparse-Coding-Modell durchzuführen. Weil eine Spike-and-Slab-A-priori-Verteilung zu einer hoch multimodalen A-posteriori-Landschaft mit viel zu hohen Suchkosten führt, entwickeln wir zusätzlich zur exakten Methode Näherungslösungen basierend auf einem Teilraum und Gibbs-Sampling für skalierbare Anwendungen des Modells. Wir vergleichen unseren Ansatz der näherungsweisen Inferenz mit näherungsweiser Variationsrechnung des linearen Spike-and-Slab-Sparse Coding. Des Weiteren kombinieren wir die Spike-and-Slab-A-priori-Verteilung mit einem nicht-linearen Sparse-Coding-Modell, das eine punktweise Maximum-Kombinationsregel zur Datengenerierung voraussetzt. Wir analysieren das Modell als genauen Kodierer von untergeordneten Merkmalen in Bildern wie z.B. Kanten und deren Okklusionen. Wir lösen die analytische Ausweglosigkeit, eine Ableitung von multimodalen A-posteriori-Verteilungen im Modell durchzuführen, durch die Kombination von Gibbs-Sampling und der Auswahl eines Teilraums, um eine skalierbare Prozedur für die approximative Inferenz des Modells zu entwickeln. Wir analysieren unsere Methode numerisch durch synthetische und wirkliche Daten zum Nachweis und Vergleich mit anderen Ansätzen. Wir bewerten den linearen Spike-and-Slab-Ansatz mittels Maßstäben für die Quellentrennung und zur Rauschunterdrückung in Bildern. In den meisten Experimenten erhalten wir vergleichsweise oder die beste Resultate. Gleichzeitig finden wir, dass Spike-and-Slab-Sparse-Coding insgesamt andere vergleichbare Ansätze übertrifft. Durch die Extraktion von Tausenden von latenten Komponenten aus einer riesigen Menge an Trainingsdaten zeigen wir des Weiteren, dass unserer Teilraum Gibbs-Sampler zu den skalierbarsten Inferenzmethoden der linearen Sparse-Coding-Modelle gehört. Für das nichtlineare Modell experimentieren wir mit künstlichen und echten Bildern zur Demonstration, dass die von dem Modell gelernten Komponenten näher an der “Ground Truth” liegen und leichter zu interpretieren sind als die zugrundeliegenden generierenden Einflüsse der Eingabe. Wir finden, dass – im Vergleich zu Standard-Sparse-Coding – der nichtlineare Spike-and-Slab-Ansatz Bilder komprimierend kodieren kann durch natürliche dünnbesetzte und klar erkennbare Kompositionen von latenten Komponenten. Wir zeigen auch, dass die vom Modell abgeleiteten Komponenten von natürlichen Bildern statistisch konsistenter sind in ihrer Struktur und Verteilung mit dem Antwortmuster von einfachen Zellen im primären visuellen Kortex. Diese Arbeit leistet durch neue Methoden zur komplexen Inferenz und zum Erlernen ivvon Spike-and-Slab-Sparse-Coding einen Beitrag und demonstriert deren praktikable Wirksamkeit durch einen Vielzahl von Anwendungen

    ProSper -- A Python Library for Probabilistic Sparse Coding with Non-Standard Priors and Superpositions

    Get PDF
    ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are especially well-suited in cases when data consist of components that combine non-linearly and/or for data requiring flexible prior distributions. Furthermore, the implemented algorithms go beyond standard approaches by inferring prior and noise parameters of the data, and they provide rich a-posteriori approximations for inference. The library is designed to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis (MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding (GSC, a recent spike-and-slab sparse coding approach). The algorithms are scalable due to a combination of variational approximations and parallelization. Implementations of all algorithms allow for parallel execution on multiple CPUs and multiple machines for medium to large-scale applications. Typical large-scale runs of the algorithms can use hundreds of CPUs to learn hundreds of dictionary elements from data with tens of millions of floating-point numbers such that models with several hundred thousand parameters can be optimized. The library is designed to have minimal dependencies and to be easy to use. It targets users of dictionary learning algorithms and Machine Learning researchers

    ProSper -- A Python Library for Probabilistic Sparse Coding with Non-Standard Priors and Superpositions

    Get PDF
    ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are especially well-suited in cases when data consist of components that combine non-linearly and/or for data requiring flexible prior distributions. Furthermore, the implemented algorithms go beyond standard approaches by inferring prior and noise parameters of the data, and they provide rich a-posteriori approximations for inference. The library is designed to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis (MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding (GSC, a recent spike-and-slab sparse coding approach). The algorithms are scalable due to a combination of variational approximations and parallelization. Implementations of all algorithms allow for parallel execution on multiple CPUs and multiple machines for medium to large-scale applications. Typical large-scale runs of the algorithms can use hundreds of CPUs to learn hundreds of dictionary elements from data with tens of millions of floating-point numbers such that models with several hundred thousand parameters can be optimized. The library is designed to have minimal dependencies and to be easy to use. It targets users of dictionary learning algorithms and Machine Learning researchers

    A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce

    Full text link
    Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.Comment: Published at the Thirteenth ACM Conference on Recommender Systems (RecSys '19), September 16--20, 2019, Copenhagen, Denmar
    corecore