2,153 research outputs found
Collaborative Filtering with Side Information: a Gaussian Process Perspective
We tackle the problem of collaborative filtering (CF) with side information,
through the lens of Gaussian Process (GP) regression. Driven by the idea of
using the kernel to explicitly model user-item similarities, we formulate the
GP in a way that allows the incorporation of low-rank matrix factorisation,
arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP
generalises classical Bayesian matrix factorisation models, and goes beyond
them to give a natural and elegant method for incorporating side information,
giving enhanced predictive performance for CF problems. Moreover we show that
it is a novel model for regression, especially well-suited to grid-structured
data and problems where the dependence on covariates is close to being
separable
Temporal Matrix Factorization for Tracking Concept Drift in Individual User Preferences
The matrix factorization (MF) technique has been widely adopted for solving
the rating prediction problem in recommender systems. The MF technique utilizes
the latent factor model to obtain static user preferences (user latent vectors)
and item characteristics (item latent vectors) based on historical rating data.
However, in the real world user preferences are not static but full of
dynamics. Though there are several previous works that addressed this time
varying issue of user preferences, it seems (to the best of our knowledge) that
none of them is specifically designed for tracking concept drift in individual
user preferences. Motivated by this, we develop a Temporal Matrix Factorization
approach (TMF) for tracking concept drift in each individual user latent
vector. There are two key innovative steps in our approach: (i) we develop a
modified stochastic gradient descent method to learn an individual user latent
vector at each time step, and (ii) by the Lasso regression we learn a linear
model for the transition of the individual user latent vectors. We test our
method on a synthetic dataset and several real datasets. In comparison with the
original MF, our experimental results show that our temporal method is able to
achieve lower root mean square errors (RMSE) for both the synthetic and real
datasets. One interesting finding is that the performance gain in RMSE is
mostly from those users who indeed have concept drift in their user latent
vectors at the time of prediction. In particular, for the synthetic dataset and
the Ciao dataset, there are quite a few users with that property and the
performance gains for these two datasets are roughly 20% and 5%, respectively
Variational Collaborative Learning for User Probabilistic Representation
Collaborative filtering (CF) has been successfully employed by many modern
recommender systems. Conventional CF-based methods use the user-item
interaction data as the sole information source to recommend items to users.
However, CF-based methods are known for suffering from cold start problems and
data sparsity problems. Hybrid models that utilize auxiliary information on top
of interaction data have increasingly gained attention. A few "collaborative
learning"-based models, which tightly bridges two heterogeneous learners
through mutual regularization, are recently proposed for the hybrid
recommendation. However, the "collaboration" in the existing methods are
actually asynchronous due to the alternative optimization of the two learners.
Leveraging the recent advances in variational autoencoder~(VAE), we here
propose a model consisting of two streams of mutual linked VAEs, named
variational collaborative model (VCM). Unlike the mutual regularization used in
previous works where two learners are optimized asynchronously, VCM enables a
synchronous collaborative learning mechanism. Besides, the two stream VAEs
setup allows VCM to fully leverages the Bayesian probabilistic representations
in collaborative learning. Extensive experiments on three real-life datasets
have shown that VCM outperforms several state-of-art methods.Comment: 8 pages, 4 figure
Color Image and Multispectral Image Denoising Using Block Diagonal Representation
Filtering images of more than one channel is challenging in terms of both
efficiency and effectiveness. By grouping similar patches to utilize the
self-similarity and sparse linear approximation of natural images, recent
nonlocal and transform-domain methods have been widely used in color and
multispectral image (MSI) denoising. Many related methods focus on the modeling
of group level correlation to enhance sparsity, which often resorts to a
recursive strategy with a large number of similar patches. The importance of
the patch level representation is understated. In this paper, we mainly
investigate the influence and potential of representation at patch level by
considering a general formulation with block diagonal matrix. We further show
that by training a proper global patch basis, along with a local principal
component analysis transform in the grouping dimension, a simple
transform-threshold-inverse method could produce very competitive results. Fast
implementation is also developed to reduce computational complexity. Extensive
experiments on both simulated and real datasets demonstrate its robustness,
effectiveness and efficiency
An Iterative Reweighted Method for Tucker Decomposition of Incomplete Multiway Tensors
We consider the problem of low-rank decomposition of incomplete multiway
tensors. Since many real-world data lie on an intrinsically low dimensional
subspace, tensor low-rank decomposition with missing entries has applications
in many data analysis problems such as recommender systems and image
inpainting. In this paper, we focus on Tucker decomposition which represents an
Nth-order tensor in terms of N factor matrices and a core tensor via
multilinear operations. To exploit the underlying multilinear low-rank
structure in high-dimensional datasets, we propose a group-based log-sum
penalty functional to place structural sparsity over the core tensor, which
leads to a compact representation with smallest core tensor. The method for
Tucker decomposition is developed by iteratively minimizing a surrogate
function that majorizes the original objective function, which results in an
iterative reweighted process. In addition, to reduce the computational
complexity, an over-relaxed monotone fast iterative shrinkage-thresholding
technique is adapted and embedded in the iterative reweighted process. The
proposed method is able to determine the model complexity (i.e. multilinear
rank) in an automatic way. Simulation results show that the proposed algorithm
offers competitive performance compared with other existing algorithms
Learning Tensors in Reproducing Kernel Hilbert Spaces with Multilinear Spectral Penalties
We present a general framework to learn functions in tensor product
reproducing kernel Hilbert spaces (TP-RKHSs). The methodology is based on a
novel representer theorem suitable for existing as well as new spectral
penalties for tensors. When the functions in the TP-RKHS are defined on the
Cartesian product of finite discrete sets, in particular, our main problem
formulation admits as a special case existing tensor completion problems. Other
special cases include transfer learning with multimodal side information and
multilinear multitask learning. For the latter case, our kernel-based view is
instrumental to derive nonlinear extensions of existing model classes. We give
a novel algorithm and show in experiments the usefulness of the proposed
extensions
Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures
The natural habitat of most Bayesian methods is data represented by
exchangeable sequences of observations, for which de Finetti's theorem provides
the theoretical foundation. Dirichlet process clustering, Gaussian process
regression, and many other parametric and nonparametric Bayesian models fall
within the remit of this framework; many problems arising in modern data
analysis do not. This article provides an introduction to Bayesian models of
graphs, matrices, and other data that can be modeled by random structures. We
describe results in probability theory that generalize de Finetti's theorem to
such data and discuss their relevance to nonparametric Bayesian modeling. With
the basic ideas in place, we survey example models available in the literature;
applications of such models include collaborative filtering, link prediction,
and graph and network analysis. We also highlight connections to recent
developments in graph theory and probability, and sketch the more general
mathematical foundation of Bayesian methods for other types of data beyond
sequences and arrays
Learnable Bernoulli Dropout for Bayesian Deep Learning
In this work, we propose learnable Bernoulli dropout (LBD), a new
model-agnostic dropout scheme that considers the dropout rates as parameters
jointly optimized with other model parameters. By probabilistic modeling of
Bernoulli dropout, our method enables more robust prediction and uncertainty
quantification in deep models. Especially, when combined with variational
auto-encoders (VAEs), LBD enables flexible semi-implicit posterior
representations, leading to new semi-implicit VAE~(SIVAE) models. We solve the
optimization for training with respect to the dropout parameters using
Augment-REINFORCE-Merge (ARM), an unbiased and low-variance gradient estimator.
Our experiments on a range of tasks show the superior performance of our
approach compared with other commonly used dropout schemes. Overall, LBD leads
to improved accuracy and uncertainty estimates in image classification and
semantic segmentation. Moreover, using SIVAE, we can achieve state-of-the-art
performance on collaborative filtering for implicit feedback on several public
datasets.Comment: To appear in AISTATS 202
Meta-Learning surrogate models for sequential decision making
We introduce a unified probabilistic framework for solving sequential
decision making problems ranging from Bayesian optimisation to contextual
bandits and reinforcement learning. This is accomplished by a probabilistic
model-based approach that explains observed data while capturing predictive
uncertainty during the decision making process. Crucially, this probabilistic
model is chosen to be a Meta-Learning system that allows learning from a
distribution of related problems, allowing data efficient adaptation to a
target task. As a suitable instantiation of this framework, we explore the use
of Neural processes due to statistical and computational desiderata. We apply
our framework to a broad range of problem domains, such as control problems,
recommender systems and adversarial attacks on RL agents, demonstrating an
efficient and general black-box learning approach
Tensor Decomposition for Signal Processing and Machine Learning
Tensors or {\em multi-way arrays} are functions of three or more indices
-- similar to matrices (two-way arrays), which are functions
of two indices for (row,column). Tensors have a rich history,
stretching over almost a century, and touching upon numerous disciplines; but
they have only recently become ubiquitous in signal and data analytics at the
confluence of signal processing, statistics, data mining and machine learning.
This overview article aims to provide a good starting point for researchers and
practitioners interested in learning about and working with tensors. As such,
it focuses on fundamentals and motivation (using various application examples),
aiming to strike an appropriate balance of breadth {\em and depth} that will
enable someone having taken first graduate courses in matrix algebra and
probability to get started doing research and/or developing tensor algorithms
and software. Some background in applied optimization is useful but not
strictly required. The material covered includes tensor rank and rank
decomposition; basic tensor factorization models and their relationships and
properties (including fairly good coverage of identifiability); broad coverage
of algorithms ranging from alternating optimization to stochastic gradient;
statistical performance analysis; and applications ranging from source
separation to collaborative filtering, mixture and topic modeling,
classification, and multilinear subspace learning.Comment: revised version, overview articl
- …