12,431 research outputs found
Assessing the Potential of Classical Q-learning in General Game Playing
After the recent groundbreaking results of AlphaGo and AlphaZero, we have
seen strong interests in deep reinforcement learning and artificial general
intelligence (AGI) in game playing. However, deep learning is
resource-intensive and the theory is not yet well developed. For small games,
simple classical table-based Q-learning might still be the algorithm of choice.
General Game Playing (GGP) provides a good testbed for reinforcement learning
to research AGI. Q-learning is one of the canonical reinforcement learning
methods, and has been used by (Banerjee Stone, IJCAI 2007) in GGP. In this
paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe,
Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to
allow comparison to Banerjee et al.. We find that Q-learning converges to a
high win rate in GGP. For the -greedy strategy, we propose a first
enhancement, the dynamic algorithm. In addition, inspired by (Gelly
Silver, ICML 2007) we combine online search (Monte Carlo Search) to
enhance offline learning, and propose QM-learning for GGP. Both enhancements
improve the performance of classical Q-learning. In this work, GGP allows us to
show, if augmented by appropriate enhancements, that classical table-based
Q-learning can perform well in small games.Comment: arXiv admin note: substantial text overlap with arXiv:1802.0594
10 simple rules to create a serious game, illustrated with examples from structural biology
Serious scientific games are games whose purpose is not only fun. In the
field of science, the serious goals include crucial activities for scientists:
outreach, teaching and research. The number of serious games is increasing
rapidly, in particular citizen science games, games that allow people to
produce and/or analyze scientific data. Interestingly, it is possible to build
a set of rules providing a guideline to create or improve serious games. We
present arguments gathered from our own experience ( Phylo , DocMolecules ,
HiRE-RNA contest and Pangu) as well as examples from the growing literature on
scientific serious games
Allocation in Practice
How do we allocate scarcere sources? How do we fairly allocate costs? These
are two pressing challenges facing society today. I discuss two recent projects
at NICTA concerning resource and cost allocation. In the first, we have been
working with FoodBank Local, a social startup working in collaboration with
food bank charities around the world to optimise the logistics of collecting
and distributing donated food. Before we can distribute this food, we must
decide how to allocate it to different charities and food kitchens. This gives
rise to a fair division problem with several new dimensions, rarely considered
in the literature. In the second, we have been looking at cost allocation
within the distribution network of a large multinational company. This also has
several new dimensions rarely considered in the literature.Comment: To appear in Proc. of 37th edition of the German Conference on
Artificial Intelligence (KI 2014), Springer LNC
False-Name Manipulation in Weighted Voting Games is Hard for Probabilistic Polynomial Time
False-name manipulation refers to the question of whether a player in a
weighted voting game can increase her power by splitting into several players
and distributing her weight among these false identities. Analogously to this
splitting problem, the beneficial merging problem asks whether a coalition of
players can increase their power in a weighted voting game by merging their
weights. Aziz et al. [ABEP11] analyze the problem of whether merging or
splitting players in weighted voting games is beneficial in terms of the
Shapley-Shubik and the normalized Banzhaf index, and so do Rey and Rothe [RR10]
for the probabilistic Banzhaf index. All these results provide merely
NP-hardness lower bounds for these problems, leaving the question about their
exact complexity open. For the Shapley--Shubik and the probabilistic Banzhaf
index, we raise these lower bounds to hardness for PP, "probabilistic
polynomial time", and provide matching upper bounds for beneficial merging and,
whenever the number of false identities is fixed, also for beneficial
splitting, thus resolving previous conjectures in the affirmative. It follows
from our results that beneficial merging and splitting for these two power
indices cannot be solved in NP, unless the polynomial hierarchy collapses,
which is considered highly unlikely
- …