Search CORE

28,541 research outputs found

Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start Users

Author: Chua Tat-Seng
He Xiangnan
Jiang Peng
Lei Wenqiang
Li Shijun
Wu Qingyun
Publication venue
Publication date: 08/06/2021
Field of study

Static recommendation methods like collaborative filtering suffer from the inherent limitation of performing real-time personalization for cold-start users. Online recommendation, e.g., multi-armed bandit approach, addresses this limitation by interactively exploring user preference online and pursuing the exploration-exploitation (EE) trade-off. However, existing bandit-based methods model recommendation actions homogeneously. Specifically, they only consider the items as the arms, being incapable of handling the item attributes, which naturally provide interpretable information of user's current demands and can effectively filter out undesired items. In this work, we consider the conversational recommendation for cold-start users, where a system can both ask the attributes from and recommend items to a user interactively. This important scenario was studied in a recent work. However, it employs a hand-crafted function to decide when to ask attributes or make recommendations. Such separate modeling of attributes and items makes the effectiveness of the system highly rely on the choice of the hand-crafted function, thus introducing fragility to the system. To address this limitation, we seamlessly unify attributes and items in the same arm space and achieve their EE trade-offs automatically using the framework of Thompson Sampling. Our Conversational Thompson Sampling (ConTS) model holistically solves all questions in conversational recommendation by choosing the arm with the maximal reward to play. Extensive experiments on three benchmark datasets show that ConTS outperforms the state-of-the-art methods Conversational UCB (ConUCB) and Estimation-Action-Reflection model in both metrics of success rate and average number of conversation turns.Comment: TOIS 202

arXiv.org e-Print Archive

Multi-step Reinforcement Learning: A Unifying Algorithm

Author: De Asis Kristopher
Hernandez-Garcia J. Fernando
Holland G. Zacharias
Sutton Richard S.
Publication venue
Publication date: 29/04/2018
Field of study

Unifying seemingly disparate algorithmic ideas to produce better performing algorithms has been a longstanding goal in reinforcement learning. As a primary example, TD(

\lambda

) elegantly unifies one-step TD prediction with Monte Carlo methods through the use of eligibility traces and the trace-decay parameter

\lambda

. Currently, there are a multitude of algorithms that can be used to perform TD control, including Sarsa,

Q

-learning, and Expected Sarsa. These methods are often studied in the one-step case, but they can be extended across multiple time steps to achieve better performance. Each of these algorithms is seemingly distinct, and no one dominates the others for all problems. In this paper, we study a new multi-step action-value algorithm called

Q(\sigma)

which unifies and generalizes these existing algorithms, while subsuming them as special cases. A new parameter,

\sigma

, is introduced to allow the degree of sampling performed by the algorithm at each step during its backup to be continuously varied, with Sarsa existing at one extreme (full sampling), and Expected Sarsa existing at the other (pure expectation).

Q(\sigma)

is generally applicable to both on- and off-policy learning, but in this work we focus on experiments in the on-policy case. Our results show that an intermediate value of

\sigma

, which results in a mixture of the existing algorithms, performs better than either extreme. The mixture can also be varied dynamically which can result in even greater performance.Comment: Appeared at the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Author: A Dezfouli
AD Redish
AM Taylor
D Bouneffouf
D Bouneffouf
DC Perry
LE Hess
M Luman
MJ Frank
P Auer
P Auer
P Auer
TL Lai
TU Hauser
W Thompson
WR Thompson
WW Seeley
Publication venue
Publication date: 07/06/2017
Field of study

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1

arXiv.org e-Print Archive

Crossref

Unifying Projected Entangled Pair States contractions

Author: Bañuls Mari-Carmen
Cirac J. Ignacio
Lubasch Michael
Publication venue: 'IOP Publishing'
Publication date: 13/03/2014
Field of study

The approximate contraction of a Projected Entangled Pair States (PEPS) tensor network is a fundamental ingredient of any PEPS algorithm, required for the optimization of the tensors in ground state search or time evolution, as well as for the evaluation of expectation values. An exact contraction is in general impossible, and the choice of the approximating procedure determines the efficiency and accuracy of the algorithm. We analyze different previous proposals for this approximation, and show that they can be understood via the form of their environment, i.e. the operator that results from contracting part of the network. This provides physical insight into the limitation of various approaches, and allows us to introduce a new strategy, based on the idea of clusters, that unifies previous methods. The resulting contraction algorithm interpolates naturally between the cheapest and most imprecise and the most costly and most precise method. We benchmark the different algorithms with finite PEPS, and show how the cluster strategy can be used for both the tensor optimization and the calculation of expectation values. Additionally, we discuss its applicability to the parallelization of PEPS and to infinite systems (iPEPS).Comment: 28 pages, 15 figures, accepted versio

arXiv.org e-Print Archive

MPG.PuRe

Learning with Options that Terminate Off-Policy

Author: Bacon Pierre-Luc
Harutyunyan Anna
Nowe Ann
Precup Doina
Vrancx Peter
Publication venue
Publication date: 02/12/2017
Field of study

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy exactly, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(\beta), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(\beta) by casting learning with options into a common framework with well-studied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Langevin and Hamiltonian based Sequential MCMC for Efficient Bayesian Filtering in High-dimensional Spaces

Author: Peters Gareth W.
Septier Francois
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/10/2015
Field of study

Nonlinear non-Gaussian state-space models arise in numerous applications in statistics and signal processing. In this context, one of the most successful and popular approximation techniques is the Sequential Monte Carlo (SMC) algorithm, also known as particle filtering. Nevertheless, this method tends to be inefficient when applied to high dimensional problems. In this paper, we focus on another class of sequential inference methods, namely the Sequential Markov Chain Monte Carlo (SMCMC) techniques, which represent a promising alternative to SMC methods. After providing a unifying framework for the class of SMCMC approaches, we propose novel efficient strategies based on the principle of Langevin diffusion and Hamiltonian dynamics in order to cope with the increasing number of high-dimensional applications. Simulation results show that the proposed algorithms achieve significantly better performance compared to existing algorithms

arXiv.org e-Print Archive