11,331 research outputs found

    Evaluating Player Strategies in the Design of a Hot Hand Game

    Get PDF
    The user’s strategy and their approach to decisionmakingare two important concerns when designing user-centricsoftware. While decision-making and strategy are key factors in awide range of business systems from stock market trading tomedical diagnosis, in this paper we focus on the role these factorsplay in a serious computer game. Players may adopt individualstrategies when playing a computer game. Furthermore, differentapproaches to playing the game may impact on the effectivenessof the core mechanics designed into the game play. In this paperwe investigate player strategy in relation to two serious gamesdesigned for studying the ‘hot hand’. The ‘hot hand’ is aninteresting psychological phenomenon originally studied in sportssuch as basketball. The study of ‘hot hand’ promises to shedfurther light on cognitive decision-making tasks applicable todomains beyond sport. The ‘hot hand’ suggests that playerssometimes display above average performance, get on a hotstreak, or develop ‘hot hands’. Although this is a widely heldbelief, analysis of data in a number of sports has produced mixedfindings. While this lack of evidence may indicate belief in the hothand is a cognitive fallacy, alternate views have suggested thatthe player’s strategy, confidence, and risk-taking may accountfor the difficulty of measuring the hot hand. Unfortunately, it isdifficult to objectively measure and quantify the amount of risktaking in a sporting contest. Therefore to investigate thisphenomenon more closely we developed novel, tailor-madecomputer games that allow rigorous empirical study of ‘hothands’. The design of such games has some specific designrequirements. The gameplay needs to allow players to perform asequence of repeated challenges, where they either fail or succeedwith about equal likelihood. Importantly the design also needs toallow players to choose a strategy entailing more or less risk inresponse to their current performance. In this paper we comparetwo hot hand game designs by collecting empirical data thatcaptures player performance in terms of success and level ofdifficulty (as gauged by response time). We then use a variety ofanalytical and visualization techniques to study player strategiesin these games. This allows us to detect a key design flaw the firstgame and validate the design of the second game for use infurther studies of the hot hand phenomenon

    Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning

    Get PDF
    Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent's internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society.Comment:

    Real-time scheduling of renewable power systems through planning-based reinforcement learning

    Full text link
    The growing renewable energy sources have posed significant challenges to traditional power scheduling. It is difficult for operators to obtain accurate day-ahead forecasts of renewable generation, thereby requiring the future scheduling system to make real-time scheduling decisions aligning with ultra-short-term forecasts. Restricted by the computation speed, traditional optimization-based methods can not solve this problem. Recent developments in reinforcement learning (RL) have demonstrated the potential to solve this challenge. However, the existing RL methods are inadequate in terms of constraint complexity, algorithm performance, and environment fidelity. We are the first to propose a systematic solution based on the state-of-the-art reinforcement learning algorithm and the real power grid environment. The proposed approach enables planning and finer time resolution adjustments of power generators, including unit commitment and economic dispatch, thus increasing the grid's ability to admit more renewable energy. The well-trained scheduling agent significantly reduces renewable curtailment and load shedding, which are issues arising from traditional scheduling's reliance on inaccurate day-ahead forecasts. High-frequency control decisions exploit the existing units' flexibility, reducing the power grid's dependence on hardware transformations and saving investment and operating costs, as demonstrated in experimental results. This research exhibits the potential of reinforcement learning in promoting low-carbon and intelligent power systems and represents a solid step toward sustainable electricity generation.Comment: 12 pages, 7 figure

    Challenges in Energy Awareness: a Swedish case of heating consumption in households

    Get PDF
    An efficient and sustainable energy system is an important factor when minimising the environmental impact caused by the cities. We have worked with questions on how to construct a more direct connection between customers-­‐citizens and a provider of district heating for negotiating notions of comfort in relation to heating and hot tap water use. In this paper we present visualisation concepts of such connections and reflect on the outcomes in terms of the type of data needed for sustainability assessment, as well as the methods explored for channelling information on individual consumption and environmental impact between customers and the provider of district heating. We have defined challenges in sustainable design for consumer behaviour change in the case of reducing heat and hot water consumption in individual households: (1) The problematic relation between individual behaviour steering and system level district heating, (2) The complexity of environmental impact as indicator for behaviour change, and (3) Ethical considerations concerning the role of the designer

    Imitation learning based on entropy-regularized forward and inverse reinforcement learning

    Get PDF
    This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a combination of forward and inverse reinforcement learning under the framework of the entropy-regularized Markov decision process. ERIL minimizes the reverse Kullback-Leibler (KL) divergence between two probability distributions induced by a learner and an expert. Inverse reinforcement learning (RL) in ERIL evaluates the log-ratio between two distributions using the density ratio trick, which is widely used in generative adversarial networks. More specifically, the log-ratio is estimated by building two binary discriminators. The first discriminator is a state-only function, and it tries to distinguish the state generated by the forward RL step from the expert's state. The second discriminator is a function of current state, action, and transitioned state, and it distinguishes the generated experiences from the ones provided by the expert. Since the second discriminator has the same hyperparameters of the forward RL step, it can be used to control the discriminator's ability. The forward RL minimizes the reverse KL estimated by the inverse RL. We show that minimizing the reverse KL divergence is equivalent to finding an optimal policy under entropy regularization. Consequently, a new policy is derived from an algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our experimental results on MuJoCo-simulated environments show that ERIL is more sample-efficient than such previous methods. We further apply the method to human behaviors in performing a pole-balancing task and show that the estimated reward functions show how every subject achieves the goal.Comment: 33 pages, 10 figure
    • 

    corecore