377 research outputs found
Intrinsically Motivated Reinforcement Learning based Recommendation with Counterfactual Data Augmentation
Deep reinforcement learning (DRL) has been proven its efficiency in capturing
users' dynamic interests in recent literature. However, training a DRL agent is
challenging, because of the sparse environment in recommender systems (RS), DRL
agents could spend times either exploring informative user-item interaction
trajectories or using existing trajectories for policy learning. It is also
known as the exploration and exploitation trade-off which affects the
recommendation performance significantly when the environment is sparse. It is
more challenging to balance the exploration and exploitation in DRL RS where RS
agent need to deeply explore the informative trajectories and exploit them
efficiently in the context of recommender systems. As a step to address this
issue, We design a novel intrinsically ,otivated reinforcement learning method
to increase the capability of exploring informative interaction trajectories in
the sparse environment, which are further enriched via a counterfactual
augmentation strategy for more efficient exploitation. The extensive
experiments on six offline datasets and three online simulation platforms
demonstrate the superiority of our model to a set of existing state-of-the-art
methods
Crawling in Rogue's dungeons with (partitioned) A3C
Rogue is a famous dungeon-crawling video-game of the 80ies, the ancestor of
its gender. Rogue-like games are known for the necessity to explore partially
observable and always different randomly-generated labyrinths, preventing any
form of level replay. As such, they serve as a very natural and challenging
task for reinforcement learning, requiring the acquisition of complex,
non-reactive behaviors involving memory and planning. In this article we show
how, exploiting a version of A3C partitioned on different situations, the agent
is able to reach the stairs and descend to the next level in 98% of cases.Comment: Accepted at the Fourth International Conference on Machine Learning,
Optimization, and Data Science (LOD 2018
Bio-Inspired Virtual Populations: Adaptive Behavior with Affective Feedback
In this paper, we describe an agency model for generative populations of humanoid characters, based upon temporal variation of affective states. We have built on an existing agent framework from Sequeira et al. [17], and adapted it to be susceptible to temperamental and emotive states in the context of cooperative and non-cooperative interactions based on trading activity. More specifically, this model operates within two existing frameworks: a) intrinsically motivated reinforcement learning, structured upon affective appraisals in the relationship of the agents with their environment [19,17]; b) a multi-temporal representation of individual psychology, common in the field of affective computing, structuring individual psychology as a tripartite relationship: emotions-moods-personality [7,15]. Results show a populations of agents that express their individuality and autonomy with a high level of heterogeneous and spontaneous behaviors, while simultaneously adapting and overcoming their perceptual limitations
- …