155 research outputs found
Cross-Domain Multitask Learning with Latent Probit Models
Learning multiple tasks across heterogeneous domains is a challenging problem since the feature space may not be the same for different tasks. We assume the data in multiple tasks are generated from a latent common domain via sparse domain transforms and propose a latent probit model (LPM) to jointly learn the domain transforms, and the shared probit classifier in the common domain. To learn meaningful task relatedness and avoid over-fitting in classification, we introduce sparsity in the domain transforms matrices, as well as in the common classifier. We derive theoretical bounds for the estimation error of the classifier in terms of the sparsity of domain transforms. An expectation-maximization algorithm is derived for learning the LPM. The effectiveness of the approach is demonstrated on several real datasets
Deep Learning for Recommender Systems
The widespread adoption of the Internet has led to an explosion in the number of choices available to consumers. Users begin to expect personalized content in modern E-commerce, entertainment and social media platforms. Recommender Systems (RS) provide a critical solution to this problem by maintaining user engagement and satisfaction with personalized content.
Traditional RS techniques are often linear limiting the expressivity required to model complex user-item interactions and require extensive handcrafted features from domain experts. Deep learning demonstrated significant breakthroughs in solving problems that have alluded the artificial intelligence community for many years advancing state-of-the-art results in domains such as computer vision and natural language processing.
The recommender domain consists of heterogeneous and semantically rich data such as unstructured text (e.g. product descriptions), categorical attributes (e.g. genre of a movie), and user-item feedback (e.g. purchases). Deep learning can automatically capture the intricate structure of user preferences by encoding learned feature representations from high dimensional data.
In this thesis, we explore five novel applications of deep learning-based techniques to address top-n recommendation. First, we propose Collaborative Memory Network, which unifies the strengths of the latent factor model and neighborhood-based methods inspired by Memory Networks to address collaborative filtering with implicit feedback. Second, we propose Neural Semantic Personalized Ranking, a novel probabilistic generative modeling approach to integrate deep neural network with pairwise ranking for the item cold-start problem. Third, we propose Attentive Contextual Denoising Autoencoder augmented with a context-driven attention mechanism to integrate arbitrary user and item attributes. Fourth, we propose a flexible encoder-decoder architecture called Neural Citation Network, embodying a powerful max time delay neural network encoder augmented with an attention mechanism and author networks to address context-aware citation recommendation. Finally, we propose a generic framework to perform conversational movie recommendations which leverages transfer learning to infer user preferences from natural language. Comprehensive experiments validate the effectiveness of all five proposed models against competitive baseline methods and demonstrate the successful adaptation of deep learning-based techniques to the recommendation domain
Neural Collaborative Filtering
In recent years, deep neural networks have yielded immense success on speech
recognition, computer vision and natural language processing. However, the
exploration of deep neural networks on recommender systems has received
relatively less scrutiny. In this work, we strive to develop techniques based
on neural networks to tackle the key problem in recommendation -- collaborative
filtering -- on the basis of implicit feedback. Although some recent work has
employed deep learning for recommendation, they primarily used it to model
auxiliary information, such as textual descriptions of items and acoustic
features of musics. When it comes to model the key factor in collaborative
filtering -- the interaction between user and item features, they still
resorted to matrix factorization and applied an inner product on the latent
features of users and items. By replacing the inner product with a neural
architecture that can learn an arbitrary function from data, we present a
general framework named NCF, short for Neural network-based Collaborative
Filtering. NCF is generic and can express and generalize matrix factorization
under its framework. To supercharge NCF modelling with non-linearities, we
propose to leverage a multi-layer perceptron to learn the user-item interaction
function. Extensive experiments on two real-world datasets show significant
improvements of our proposed NCF framework over the state-of-the-art methods.
Empirical evidence shows that using deeper layers of neural networks offers
better recommendation performance.Comment: 10 pages, 7 figure
Recommended from our members
Knowledge transfer using latent variable models
textIn several applications, scarcity of labeled data is a challenging problem that hinders the predictive capabilities of machine learning algorithms. Additionally, the distribution of the data changes over time, rendering models trained with older data less capable of discovering useful structure from the newly available data. Transfer learning is a convenient framework to overcome such problems where the learning of a model specific to a domain can benefit the learning of other models in other domains through either simultaneous training of domains or sequential transfer of knowledge from one domain to the others. This thesis explores the opportunities of knowledge transfer in the context of a few applications pertaining to object recognition from images, text analysis, network modeling and recommender systems, using probabilistic latent variable models as building blocks. Both simultaneous and sequential knowledge transfer are achieved through the latent variables, either by sharing these across multiple related domains (for simultaneous learning) or by adapting their distributions to fit data from a new domain (for sequential learning).Electrical and Computer Engineerin
Kernels for Vector-Valued Functions: a Review
Kernel methods are among the most popular techniques in machine learning.
From a frequentist/discriminative perspective they play a central role in
regularization theory as they provide a natural choice for the hypotheses space
and the regularization functional through the notion of reproducing kernel
Hilbert spaces. From a Bayesian/generative perspective they are the key in the
context of Gaussian processes, where the kernel function is also known as the
covariance function. Traditionally, kernel methods have been used in supervised
learning problem with scalar outputs and indeed there has been a considerable
amount of work devoted to designing and learning kernels. More recently there
has been an increasing interest in methods that deal with multiple outputs,
motivated partly by frameworks like multitask learning. In this paper, we
review different methods to design or learn valid kernel functions for multiple
outputs, paying particular attention to the connection between probabilistic
and functional methods
Bayesian Learning in the Counterfactual World
Recent years have witnessed a surging interest towards the use of machine learning tools for causal inference. In contrast to the usual large data settings where the primary goal is prediction, many disciplines, such as health, economic and social sciences, are instead interested in causal questions. Learning individualized responses to an intervention is a crucial task in many applied fields (e.g., precision medicine, targeted advertising, precision agriculture, etc.) where the ultimate goal is to design optimal and highly-personalized policies based on individual features. In this work, I thus tackle the problem of estimating causal effects of an intervention that are heterogeneous across a population of interest and depend on an individual set of characteristics (e.g., a patient's clinical record, user's browsing history, etc..) in high-dimensional observational data settings. This is done by utilizing Bayesian Nonparametric or Probabilistic Machine Learning tools that are specifically adjusted for the causal setting and have desirable uncertainty quantification properties, with a focus on the issues of interpretability/explainability and inclusion of domain experts' prior knowledge. I begin by introducing terminology and concepts from causality and causal reasoning in the first chapter. Then I include a literature review of some of the state-of-the-art regression-based methods for heterogeneous treatment effects estimation, with an attempt to build a unifying taxonomy and lay down the finite-sample empirical properties of these models. The chapters forming the core of the dissertation instead present some novel methods addressing existing issues in individualized causal effects estimation: Chapter 3 develops both a Bayesian tree ensemble method and a deep learning architecture to tackle interpretability, uncertainty coverage and targeted regularization; Chapter 4 instead introduces a novel multi-task Deep Kernel Learning method particularly suited for multi-outcome | multi-action scenarios. The last chapter concludes with a discussion
- …