478 research outputs found
Recurrent Latent Variable Networks for Session-Based Recommendation
In this work, we attempt to ameliorate the impact of data sparsity in the
context of session-based recommendation. Specifically, we seek to devise a
machine learning mechanism capable of extracting subtle and complex underlying
temporal dynamics in the observed session data, so as to inform the
recommendation algorithm. To this end, we improve upon systems that utilize
deep learning techniques with recurrently connected units; we do so by adopting
concepts from the field of Bayesian statistics, namely variational inference.
Our proposed approach consists in treating the network recurrent units as
stochastic latent variables with a prior distribution imposed over them. On
this basis, we proceed to infer corresponding posteriors; these can be used for
prediction and recommendation generation, in a way that accounts for the
uncertainty in the available sparse training data. To allow for our approach to
easily scale to large real-world datasets, we perform inference under an
approximate amortized variational inference (AVI) setup, whereby the learned
posteriors are parameterized via (conventional) neural networks. We perform an
extensive experimental evaluation of our approach using challenging benchmark
datasets, and illustrate its superiority over existing state-of-the-art
techniques
Naive Feature Selection: Sparsity in Naive Bayes
Due to its linear complexity, naive Bayes classification remains an
attractive supervised learning method, especially in very large-scale settings.
We propose a sparse version of naive Bayes, which can be used for feature
selection. This leads to a combinatorial maximum-likelihood problem, for which
we provide an exact solution in the case of binary data, or a bound in the
multinomial case. We prove that our bound becomes tight as the marginal
contribution of additional features decreases. Both binary and multinomial
sparse models are solvable in time almost linear in problem size, representing
a very small extra relative cost compared to the classical naive Bayes.
Numerical experiments on text data show that the naive Bayes feature selection
method is as statistically effective as state-of-the-art feature selection
methods such as recursive feature elimination, -penalized logistic
regression and LASSO, while being orders of magnitude faster. For a large data
set, having more than with million training points and about million
features, and with a non-optimized CPU implementation, our sparse naive Bayes
model can be trained in less than 15 seconds
A Relational Tucker Decomposition for Multi-Relational Link Prediction
We propose the Relational Tucker3 (RT) decomposition for multi-relational
link prediction in knowledge graphs. We show that many existing knowledge graph
embedding models are special cases of the RT decomposition with certain
predefined sparsity patterns in its components. In contrast to these prior
models, RT decouples the sizes of entity and relation embeddings, allows
parameter sharing across relations, and does not make use of a predefined
sparsity pattern. We use the RT decomposition as a tool to explore whether it
is possible and beneficial to automatically learn sparsity patterns, and
whether dense models can outperform sparse models (using the same number of
parameters). Our experiments indicate that---depending on the dataset--both
questions can be answered affirmatively
Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data
In domains like bioinformatics, information retrieval and social network
analysis, one can find learning tasks where the goal consists of inferring a
ranking of objects, conditioned on a particular target object. We present a
general kernel framework for learning conditional rankings from various types
of relational data, where rankings can be conditioned on unseen data objects.
We propose efficient algorithms for conditional ranking by optimizing squared
regression and ranking loss functions. We show theoretically, that learning
with the ranking loss is likely to generalize better than with the regression
loss. Further, we prove that symmetry or reciprocity properties of relations
can be efficiently enforced in the learned models. Experiments on synthetic and
real-world data illustrate that the proposed methods deliver state-of-the-art
performance in terms of predictive power and computational efficiency.
Moreover, we also show empirically that incorporating symmetry or reciprocity
properties can improve the generalization performance
One-Pass Ranking Models for Low-Latency Product Recommendations
Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommenda-tions by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the näıve applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers’ preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equiva
- …