103,433 research outputs found
Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi
Hanabi is a cooperative game that brings the problem of modeling other
players to the forefront. In this game, coordinated groups of players can
leverage pre-established conventions to great effect, but playing in an ad-hoc
setting requires agents to adapt to its partner's strategies with no previous
coordination. Evaluating an agent in this setting requires a diverse population
of potential partners, but so far, the behavioral diversity of agents has not
been considered in a systematic way. This paper proposes Quality Diversity
algorithms as a promising class of algorithms to generate diverse populations
for this purpose, and generates a population of diverse Hanabi agents using
MAP-Elites. We also postulate that agents can benefit from a diverse population
during training and implement a simple "meta-strategy" for adapting to an
agent's perceived behavioral niche. We show this meta-strategy can work better
than generalist strategies even outside the population it was trained with if
its partner's behavioral niche can be correctly inferred, but in practice a
partner's behavior depends and interferes with the meta-agent's own behavior,
suggesting an avenue for future research in characterizing another agent's
behavior during gameplay.Comment: arXiv admin note: text overlap with arXiv:1907.0384
Reward and Punishment in Minigames
Minigames capturing the essence of Public Goods experiments show that even in the absence of rationality assumptions, both punishment and reward will fail to bring about prosocial behavior. This holds in particular for the well-known Ultimatum Game, which emerges as a special case. But reputation can induce fairness and cooperation in populations adapting through learning or imitation. Indeed, the inclusion of reputation effects in the corresponding dynamical models leads to the evolution of economically productive behavior, with agents contributing to the public good and either punishing those who don't, or rewarding those who do. Reward and punishment correspond to two types of bifurcation with intriguing complementarity. The analysis suggests that reputation is essential for fostering social behavior among selfish agents, and that it is considerably more effective with punishment than with rewards
Bio-Inspired Virtual Populations: Adaptive Behavior with Affective Feedback
In this paper, we describe an agency model for generative populations of humanoid characters, based upon temporal variation of affective states. We have built on an existing agent framework from Sequeira et al. [17], and adapted it to be susceptible to temperamental and emotive states in the context of cooperative and non-cooperative interactions based on trading activity. More specifically, this model operates within two existing frameworks: a) intrinsically motivated reinforcement learning, structured upon affective appraisals in the relationship of the agents with their environment [19,17]; b) a multi-temporal representation of individual psychology, common in the field of affective computing, structuring individual psychology as a tripartite relationship: emotions-moods-personality [7,15]. Results show a populations of agents that express their individuality and autonomy with a high level of heterogeneous and spontaneous behaviors, while simultaneously adapting and overcoming their perceptual limitations
Evolutionary macroeconomic assessment of employment and innovation impacts of climate policy packages
Climate policy has been mainly studied with economic models that assume representative, rational agents. Such policy aims, though, at changing carbon-intensive consumption and production patterns driven by bounded rationality and other-regarding preferences, such as status and imitation. To examine climate policy under such alternative behavioral assumptions, we develop a model tool by adapting an existing general-purpose macroeconomic multi-agent model. The resulting tool allows testing various climate policies in terms of combined climate and economic performance. The model is particularly suitable to address the distributional impacts of climate policies, not only because populations of many agents are included, but also as these are composed of different classes of households. The approach accounts for two types of innovations, which improve either the carbon or labor intensity of production. We simulate policy scenarios with distinct combinations of carbon taxation, a reduction of labor taxes, subsidies for green innovation, a price subsidy to consumers for less carbon-intensive products, and green government procurement. The results show pronounced differences with those obtained by rational-agent model studies. It turns out that a supply-oriented subsidy for green innovation, funded by the revenues of a carbon tax, results in a significant reduction of carbon emissions without causing negative effects on em ployment. On the contrary, demand-oriented subsidies for adopting greener technologies, funded in the same manner, result in either none or considerably less re- duction of carbon emissions and may even lead to higher unemployment. Our study also contributes insight on a potential double dividend of shifting taxes from labor to carbon
Comparison of Selection Methods in On-line Distributed Evolutionary Robotics
In this paper, we study the impact of selection methods in the context of
on-line on-board distributed evolutionary algorithms. We propose a variant of
the mEDEA algorithm in which we add a selection operator, and we apply it in a
taskdriven scenario. We evaluate four selection methods that induce different
intensity of selection pressure in a multi-robot navigation with obstacle
avoidance task and a collective foraging task. Experiments show that a small
intensity of selection pressure is sufficient to rapidly obtain good
performances on the tasks at hand. We introduce different measures to compare
the selection methods, and show that the higher the selection pressure, the
better the performances obtained, especially for the more challenging food
foraging task
About the Power to Enforce and Prevent Consensus by Manipulating Communication Rules
We explore the possibilities of enforcing and preventing consensus in
continuous opinion dynamics that result from modifications in the communication
rules. We refer to the model of Weisbuch and Deffuant, where agents adjust
their continuous opinions as a result of random pairwise encounters whenever
their opinions differ not more than a given bound of confidence \eps. A high
\eps leads to consensus, while a lower \eps leads to a fragmentation into
several opinion clusters. We drop the random encounter assumption and ask: How
small may \eps be such that consensus is still possible with a certain
communication plan for the entire group? Mathematical analysis shows that
\eps may be significantly smaller than in the random pairwise case. On the
other hand we ask: How large may \eps be such that preventing consensus is
still possible? In answering this question we prove Fortunato's simulation
result that consensus cannot be prevented for \eps>0.5 for large groups. %
Next we consider opinion dynamics under different individual strategies and
examine their power to increase the chances of consensus. One result is that
balancing agents increase chances of consensus, especially if the agents are
cautious in adapting their opinions. However, curious agents increase chances
of consensus only if those agents are not cautious in adapting their opinions.Comment: 21 pages, 6 figure
- …