268 research outputs found
An Adaptation of Proof-Planning to Declarer Play in Bridge
We present Finesse, a system that forms plans for declarer play in the game of Bridge. Finesse generalises the technique of proof-planning, developed at Edinburgh University in the context of mathematical theorem-proving, to deal with the disjunctive choice encountered when planning under uncertainty, and the context-dependency of actions produced by the presence of an opposition. In its domain of planning for individual suits, it correctly identified the proper lines of play found in many examples from the Bridge literature, supporting its decisions with probabilistic and qualitative information. Cases were even discovered in which Finesse revealed errors in the analyses presented by recognised authorities
Search and planning under incomplete information : a study using Bridge card play
This thesis investigates problem-solving in domains featuring incomplete information and multiple agents with opposing goals. In particular, we describe Finesse --- a system that forms plans for the problem of declarer play in the game of Bridge. We begin by examining the problem of search. We formalise a best defence model of incomplete information games in which equilibrium point strategies can be identified, and identify two specific problems that can affect algorithms in such domains. In Bridge, we show that the best defence model corresponds to the typical model analysed in expert texts, and examine search algorithms which overcome the problems we have identified. Next, we look at how planning algorithms can be made to cope with the difficulties of such domains. This calls for the development of new techniques for representing uncertainty and actions with disjunctive effects, for coping with an opposition, and for reasoning about compound actions. We tackle these problems with a..
A Planning Approach to Declarer Play in Contract Bridge
Although game-tree search works well in perfect-information games,
it is less suitable for imperfect-information games such as
contract bridge. The lack of knowledge about the opponents' possible moves
gives the game tree a very large branching factor, making it impossible
to search a significant portion of this tree in a reasonable amount of time.
This paper describes our approach for overcoming this problem. We
represent information about bridge in a task network that is extended to
represent multi-agency and uncertainty. Our game-playing procedure uses
this task network to generate game trees in which the set of alternative
choices is determined not by the set of possible actions, but by the set of
available tactical and strategic schemes.
We have tested this approach on declarer play in the game of bridge, in an
implementation called Tignum 2. On 5000 randomly generated notrump
deals, Tignum 2 beat the strongest commercially available program by 1394
to 1302, with 2304 ties. These results are statistically significant at
the alpha = 0.05 level. Tignum~2 searched an average of only 8745.6
moves per deal in an average time of only 27.5 seconds per deal on a Sun
SPARCstation 10. Further enhancements to Tignum~2 are currently
underway.
(Also cross-referenced as UMIACS-TR-95-85
Opponent Modelling in Multi-Agent Systems
Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis
- …