6 research outputs found
Playing 20 Question Game with Policy-Based Reinforcement Learning
The 20 Questions (Q20) game is a well known game which encourages deductive
reasoning and creativity. In the game, the answerer first thinks of an object
such as a famous person or a kind of animal. Then the questioner tries to guess
the object by asking 20 questions. In a Q20 game system, the user is considered
as the answerer while the system itself acts as the questioner which requires a
good strategy of question selection to figure out the correct object and win
the game. However, the optimal policy of question selection is hard to be
derived due to the complexity and volatility of the game environment. In this
paper, we propose a novel policy-based Reinforcement Learning (RL) method,
which enables the questioner agent to learn the optimal policy of question
selection through continuous interactions with users. To facilitate training,
we also propose to use a reward network to estimate the more informative
reward. Compared to previous methods, our RL method is robust to noisy answers
and does not rely on the Knowledge Base of objects. Experimental results show
that our RL method clearly outperforms an entropy-based engineering system and
has competitive performance in a noisy-free simulation environment.Comment: Accepted by EMNLP 201
Towards Question-based Recommender Systems
Conversational and question-based recommender systems have gained increasing
attention in recent years, with users enabled to converse with the system and
better control recommendations. Nevertheless, research in the field is still
limited, compared to traditional recommender systems. In this work, we propose
a novel Question-based recommendation method, Qrec, to assist users to find
items interactively, by answering automatically constructed and algorithmically
chosen questions. Previous conversational recommender systems ask users to
express their preferences over items or item facets. Our model, instead, asks
users to express their preferences over descriptive item features. The model is
first trained offline by a novel matrix factorization algorithm, and then
iteratively updates the user and item latent factors online by a closed-form
solution based on the user answers. Meanwhile, our model infers the underlying
user belief and preferences over items to learn an optimal question-asking
strategy by using Generalized Binary Search, so as to ask a sequence of
questions to the user. Our experimental results demonstrate that our proposed
matrix factorization model outperforms the traditional Probabilistic Matrix
Factorization model. Further, our proposed Qrec model can greatly improve the
performance of state-of-the-art baselines, and it is also effective in the case
of cold-start user and item recommendations.Comment: accepted by SIGIR 202
Learning to Ask: Question-based Sequential Bayesian Product Search
Product search is generally recognized as the first and foremost stage of
online shopping and thus significant for users and retailers of e-commerce.
Most of the traditional retrieval methods use some similarity functions to
match the user's query and the document that describes a product, either
directly or in a latent vector space. However, user queries are often too
general to capture the minute details of the specific product that a user is
looking for. In this paper, we propose a novel interactive method to
effectively locate the best matching product. The method is based on the
assumption that there is a set of candidate questions for each product to be
asked. In this work, we instantiate this candidate set by making the hypothesis
that products can be discriminated by the entities that appear in the documents
associated with them. We propose a Question-based Sequential Bayesian Product
Search method, QSBPS, which directly queries users on the expected presence of
entities in the relevant product documents. The method learns the product
relevance as well as the reward of the potential questions to be asked to the
user by being trained on the search history and purchase behavior of a specific
user together with that of other users. The experimental results show that the
proposed method can greatly improve the performance of product search compared
to the state-of-the-art baselines.Comment: This paper is accepted by CIKM 201