15 research outputs found
Approaching Hanabi with Q-Learning and Evolutionary Algorithm
Hanabi is a cooperative card game with hidden information that requires cooperation and communication between the players. For a machine learning agent to be successful at the Hanabi, it will have to learn how to communicate and infer information from the communication of other players. To approach the problem of Hanabi the machine learning methods of Q-learning and Evolutionary algorithm are proposed as potential solutions. The agents that were created using the method are shown to not achieve human levels of communication
Which Channel to Ask My Question? Personalized Customer Service Request Stream Routing using Deep Reinforcement Learning
Customer services are critical to all companies, as they may directly connect
to the brand reputation. Due to a great number of customers, e-commerce
companies often employ multiple communication channels to answer customers'
questions, for example, chatbot and hotline. On one hand, each channel has
limited capacity to respond to customers' requests, on the other hand,
customers have different preferences over these channels. The current
production systems are mainly built based on business rules, which merely
considers tradeoffs between resources and customers' satisfaction. To achieve
the optimal tradeoff between resources and customers' satisfaction, we propose
a new framework based on deep reinforcement learning, which directly takes both
resources and user model into account. In addition to the framework, we also
propose a new deep-reinforcement-learning based routing method-double dueling
deep Q-learning with prioritized experience replay (PER-DoDDQN). We evaluate
our proposed framework and method using both synthetic and a real customer
service log data from a large financial technology company. We show that our
proposed deep-reinforcement-learning based framework is superior to the
existing production system. Moreover, we also show our proposed PER-DoDDQN is
better than all other deep Q-learning variants in practice, which provides a
more optimal routing plan. These observations suggest that our proposed method
can seek the trade-off where both channel resources and customers' satisfaction
are optimal.Comment: 13 pages, 7 figure