2,153 research outputs found

    Collaborative Filtering with Side Information: a Gaussian Process Perspective

    Full text link
    We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems. Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable

    Temporal Matrix Factorization for Tracking Concept Drift in Individual User Preferences

    Full text link
    The matrix factorization (MF) technique has been widely adopted for solving the rating prediction problem in recommender systems. The MF technique utilizes the latent factor model to obtain static user preferences (user latent vectors) and item characteristics (item latent vectors) based on historical rating data. However, in the real world user preferences are not static but full of dynamics. Though there are several previous works that addressed this time varying issue of user preferences, it seems (to the best of our knowledge) that none of them is specifically designed for tracking concept drift in individual user preferences. Motivated by this, we develop a Temporal Matrix Factorization approach (TMF) for tracking concept drift in each individual user latent vector. There are two key innovative steps in our approach: (i) we develop a modified stochastic gradient descent method to learn an individual user latent vector at each time step, and (ii) by the Lasso regression we learn a linear model for the transition of the individual user latent vectors. We test our method on a synthetic dataset and several real datasets. In comparison with the original MF, our experimental results show that our temporal method is able to achieve lower root mean square errors (RMSE) for both the synthetic and real datasets. One interesting finding is that the performance gain in RMSE is mostly from those users who indeed have concept drift in their user latent vectors at the time of prediction. In particular, for the synthetic dataset and the Ciao dataset, there are quite a few users with that property and the performance gains for these two datasets are roughly 20% and 5%, respectively

    Variational Collaborative Learning for User Probabilistic Representation

    Full text link
    Collaborative filtering (CF) has been successfully employed by many modern recommender systems. Conventional CF-based methods use the user-item interaction data as the sole information source to recommend items to users. However, CF-based methods are known for suffering from cold start problems and data sparsity problems. Hybrid models that utilize auxiliary information on top of interaction data have increasingly gained attention. A few "collaborative learning"-based models, which tightly bridges two heterogeneous learners through mutual regularization, are recently proposed for the hybrid recommendation. However, the "collaboration" in the existing methods are actually asynchronous due to the alternative optimization of the two learners. Leveraging the recent advances in variational autoencoder~(VAE), we here propose a model consisting of two streams of mutual linked VAEs, named variational collaborative model (VCM). Unlike the mutual regularization used in previous works where two learners are optimized asynchronously, VCM enables a synchronous collaborative learning mechanism. Besides, the two stream VAEs setup allows VCM to fully leverages the Bayesian probabilistic representations in collaborative learning. Extensive experiments on three real-life datasets have shown that VCM outperforms several state-of-art methods.Comment: 8 pages, 4 figure

    Color Image and Multispectral Image Denoising Using Block Diagonal Representation

    Full text link
    Filtering images of more than one channel is challenging in terms of both efficiency and effectiveness. By grouping similar patches to utilize the self-similarity and sparse linear approximation of natural images, recent nonlocal and transform-domain methods have been widely used in color and multispectral image (MSI) denoising. Many related methods focus on the modeling of group level correlation to enhance sparsity, which often resorts to a recursive strategy with a large number of similar patches. The importance of the patch level representation is understated. In this paper, we mainly investigate the influence and potential of representation at patch level by considering a general formulation with block diagonal matrix. We further show that by training a proper global patch basis, along with a local principal component analysis transform in the grouping dimension, a simple transform-threshold-inverse method could produce very competitive results. Fast implementation is also developed to reduce computational complexity. Extensive experiments on both simulated and real datasets demonstrate its robustness, effectiveness and efficiency

    An Iterative Reweighted Method for Tucker Decomposition of Incomplete Multiway Tensors

    Full text link
    We consider the problem of low-rank decomposition of incomplete multiway tensors. Since many real-world data lie on an intrinsically low dimensional subspace, tensor low-rank decomposition with missing entries has applications in many data analysis problems such as recommender systems and image inpainting. In this paper, we focus on Tucker decomposition which represents an Nth-order tensor in terms of N factor matrices and a core tensor via multilinear operations. To exploit the underlying multilinear low-rank structure in high-dimensional datasets, we propose a group-based log-sum penalty functional to place structural sparsity over the core tensor, which leads to a compact representation with smallest core tensor. The method for Tucker decomposition is developed by iteratively minimizing a surrogate function that majorizes the original objective function, which results in an iterative reweighted process. In addition, to reduce the computational complexity, an over-relaxed monotone fast iterative shrinkage-thresholding technique is adapted and embedded in the iterative reweighted process. The proposed method is able to determine the model complexity (i.e. multilinear rank) in an automatic way. Simulation results show that the proposed algorithm offers competitive performance compared with other existing algorithms

    Learning Tensors in Reproducing Kernel Hilbert Spaces with Multilinear Spectral Penalties

    Full text link
    We present a general framework to learn functions in tensor product reproducing kernel Hilbert spaces (TP-RKHSs). The methodology is based on a novel representer theorem suitable for existing as well as new spectral penalties for tensors. When the functions in the TP-RKHS are defined on the Cartesian product of finite discrete sets, in particular, our main problem formulation admits as a special case existing tensor completion problems. Other special cases include transfer learning with multimodal side information and multilinear multitask learning. For the latter case, our kernel-based view is instrumental to derive nonlinear extensions of existing model classes. We give a novel algorithm and show in experiments the usefulness of the proposed extensions

    Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures

    Full text link
    The natural habitat of most Bayesian methods is data represented by exchangeable sequences of observations, for which de Finetti's theorem provides the theoretical foundation. Dirichlet process clustering, Gaussian process regression, and many other parametric and nonparametric Bayesian models fall within the remit of this framework; many problems arising in modern data analysis do not. This article provides an introduction to Bayesian models of graphs, matrices, and other data that can be modeled by random structures. We describe results in probability theory that generalize de Finetti's theorem to such data and discuss their relevance to nonparametric Bayesian modeling. With the basic ideas in place, we survey example models available in the literature; applications of such models include collaborative filtering, link prediction, and graph and network analysis. We also highlight connections to recent developments in graph theory and probability, and sketch the more general mathematical foundation of Bayesian methods for other types of data beyond sequences and arrays

    Learnable Bernoulli Dropout for Bayesian Deep Learning

    Full text link
    In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. By probabilistic modeling of Bernoulli dropout, our method enables more robust prediction and uncertainty quantification in deep models. Especially, when combined with variational auto-encoders (VAEs), LBD enables flexible semi-implicit posterior representations, leading to new semi-implicit VAE~(SIVAE) models. We solve the optimization for training with respect to the dropout parameters using Augment-REINFORCE-Merge (ARM), an unbiased and low-variance gradient estimator. Our experiments on a range of tasks show the superior performance of our approach compared with other commonly used dropout schemes. Overall, LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation. Moreover, using SIVAE, we can achieve state-of-the-art performance on collaborative filtering for implicit feedback on several public datasets.Comment: To appear in AISTATS 202

    Meta-Learning surrogate models for sequential decision making

    Full text link
    We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning. This is accomplished by a probabilistic model-based approach that explains observed data while capturing predictive uncertainty during the decision making process. Crucially, this probabilistic model is chosen to be a Meta-Learning system that allows learning from a distribution of related problems, allowing data efficient adaptation to a target task. As a suitable instantiation of this framework, we explore the use of Neural processes due to statistical and computational desiderata. We apply our framework to a broad range of problem domains, such as control problems, recommender systems and adversarial attacks on RL agents, demonstrating an efficient and general black-box learning approach

    Tensor Decomposition for Signal Processing and Machine Learning

    Full text link
    Tensors or {\em multi-way arrays} are functions of three or more indices (i,j,k,⋯ )(i,j,k,\cdots) -- similar to matrices (two-way arrays), which are functions of two indices (r,c)(r,c) for (row,column). Tensors have a rich history, stretching over almost a century, and touching upon numerous disciplines; but they have only recently become ubiquitous in signal and data analytics at the confluence of signal processing, statistics, data mining and machine learning. This overview article aims to provide a good starting point for researchers and practitioners interested in learning about and working with tensors. As such, it focuses on fundamentals and motivation (using various application examples), aiming to strike an appropriate balance of breadth {\em and depth} that will enable someone having taken first graduate courses in matrix algebra and probability to get started doing research and/or developing tensor algorithms and software. Some background in applied optimization is useful but not strictly required. The material covered includes tensor rank and rank decomposition; basic tensor factorization models and their relationships and properties (including fairly good coverage of identifiability); broad coverage of algorithms ranging from alternating optimization to stochastic gradient; statistical performance analysis; and applications ranging from source separation to collaborative filtering, mixture and topic modeling, classification, and multilinear subspace learning.Comment: revised version, overview articl
    • …