729 research outputs found
Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information
This paper considers repeated games in which one player has more information
about the game than the other players. In particular, we investigate repeated
two-player zero-sum games where only the column player knows the payoff matrix
A of the game. Suppose that while repeatedly playing this game, the row player
chooses her strategy at each round by using a no-regret algorithm to minimize
her (pseudo) regret. We develop a no-instant-regret algorithm for the column
player to exhibit last round convergence to a minimax equilibrium. We show that
our algorithm is efficient against a large set of popular no-regret algorithms
of the row player, including the multiplicative weight update algorithm, the
online mirror descent method/follow-the-regularized-leader, the linear
multiplicative weight update algorithm, and the optimistic multiplicative
weight update
Exploiting No-Regret Algorithms in System Design
We investigate a repeated two-player zero-sum game setting where the column
player is also a designer of the system, and has full control on the design of
the payoff matrix. In addition, the row player uses a no-regret algorithm to
efficiently learn how to adapt their strategy to the column player's behaviour
over time in order to achieve good total payoff. The goal of the column player
is to guide her opponent to pick a mixed strategy which is favourable for the
system designer. Therefore, she needs to: (i) design an appropriate payoff
matrix whose unique minimax solution contains the desired mixed strategy of
the row player; and (ii) strategically interact with the row player during a
sequence of plays in order to guide her opponent to converge to that desired
behaviour. To design such a payoff matrix, we propose a novel solution that
provably has a unique minimax solution with the desired behaviour. We also
investigate a relaxation of this problem where uniqueness is not required, but
all the minimax solutions have the same mixed strategy for the row player.
Finally, we propose a new game playing algorithm for the system designer and
prove that it can guide the row player, who may play a \emph{stable} no-regret
algorithm, to converge to a minimax solution
Online Double Oracle
Solving strategic games with huge action space is a critical yet
under-explored topic in economics, operations research and artificial
intelligence. This paper proposes new learning algorithms for solving
two-player zero-sum normal-form games where the number of pure strategies is
prohibitively large. Specifically, we combine no-regret analysis from online
learning with Double Oracle (DO) methods from game theory. Our method --
\emph{Online Double Oracle (ODO)} -- is provably convergent to a Nash
equilibrium (NE). Most importantly, unlike normal DO methods, ODO is
\emph{rationale} in the sense that each agent in ODO can exploit strategic
adversary with a regret bound of where is
not the total number of pure strategies, but rather the size of \emph{effective
strategy set} that is linearly dependent on the support size of the NE. On tens
of different real-world games, ODO outperforms DO, PSRO methods, and no-regret
algorithms such as Multiplicative Weight Update by a significant margin, both
in terms of convergence rate to a NE and average payoff against strategic
adversaries.Comment: [email protected]
Decentralized Learning in Online Queuing Systems
Motivated by packet routing in computer networks, online queuing systems are
composed of queues receiving packets at different rates. Repeatedly, they send
packets to servers, each of them treating only at most one packet at a time. In
the centralized case, the number of accumulated packets remains bounded (i.e.,
the system is \textit{stable}) as long as the ratio between service rates and
arrival rates is larger than . In the decentralized case, individual
no-regret strategies ensures stability when this ratio is larger than . Yet,
myopically minimizing regret disregards the long term effects due to the
carryover of packets to further rounds. On the other hand, minimizing long term
costs leads to stable Nash equilibria as soon as the ratio exceeds
. Stability with decentralized learning strategies with a ratio
below was a major remaining question. We first argue that for ratios up to
, cooperation is required for stability of learning strategies, as selfish
minimization of policy regret, a \textit{patient} notion of regret, might
indeed still be unstable in this case. We therefore consider cooperative queues
and propose the first learning decentralized algorithm guaranteeing stability
of the system as long as the ratio of rates is larger than , thus reaching
performances comparable to centralized strategies.Comment: NeurIPS 2021 camera read
A Free Exchange e-Marketplace for Digital Services
The digital era is witnessing a remarkable evolution of digital services. While the prospects are countless, the e-marketplaces of digital services are encountering inherent game-theoretic and computational challenges that restrict the rational choices of bidders. Our work examines the limited bidding scope and the inefficiencies of present exchange e-marketplaces. To meet challenges, a free exchange e-marketplace is proposed that follows the free market economy. The free exchange model includes a new bidding language and a double auction mechanism. The rule-based bidding language enables the flexible expression of preferences and strategic conduct. The bidding message holds the attribute-valuations and bidding rules of the selected services. The free exchange deliberates on attributes and logical bidding rules for automatic deduction and formation of elicited services and bids that result in a more rapid self-managed multiple exchange trades. The double auction uses forward and reverse generalized second price auctions for the symmetric matching of multiple digital services of identical attributes and different quality levels. The proposed double auction uses tractable heuristics that secure exchange profitability, improve truthful bidding and deliver stable social efficiency. While the strongest properties of symmetric exchanges are unfeasible game-theoretically, the free exchange converges rapidly to the social efficiency, Nash truthful stability, and weak budget balance by multiple quality-levels cross-matching, constant learning and informs at repetitive thick trades. The empirical findings validate the soundness and viability of the free exchange
Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems
A Multi-Agent Cooperative Learning (MACL) system is an artificial
intelligence (AI) system where multiple learning agents work together to
complete a common task. Recent empirical success of MACL systems in various
domains (e.g. traffic control, cloud computing, robotics) has sparked active
research into the design and analysis of MACL systems for sequential decision
making problems. One important metric of the learning algorithm for decision
making problems is its regret, i.e. the difference between the highest
achievable reward and the actual reward that the algorithm gains. The design
and development of a MACL system with low-regret learning algorithms can create
huge economic values. In this thesis, I analyze MACL systems for different
sequential decision making problems. Concretely, the Chapter 3 and 4
investigate the cooperative multi-agent multi-armed bandit problems, with
full-information or bandit feedback, in which multiple learning agents can
exchange their information through a communication network and the agents can
only observe the rewards of the actions they choose. Chapter 5 considers the
communication-regret trade-off for online convex optimization in the
distributed setting. Chapter 6 discusses how to form high-productive teams for
agents based on their unknown but fixed types using adaptive incremental
matchings. For the above problems, I present the regret lower bounds for
feasible learning algorithms and provide the efficient algorithms to achieve
this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the
regret depends on the connectivity of the communication network and the
communication delay, thus giving useful guidance on design of the communication
protocol in MACL systemsComment: Thesis submitted to London School of Economics and Political Science
for PhD in Statistic
Sunk cost accounting and entrapment in corporate acquisitions and financial markets : an experimental analysis
Sunk cost accounting refers to the empirical finding that individuals tend to let their
decisions be influenced by costs made at an earlier time in such a way that they are
more risk seeking than they would be had they not incurred these costs. Such
behaviour violates the axioms of economic theory which states individuals should
only consider incremental costs and benefits when executing investments. This
dissertation is concerned whether the pervasive sunk cost phenomenon extends to
corporate acquisitions and financial markets. 122 students from the University of St
Andrews participated in three experiments exploring the use of sunk costs in
interactive negotiation contexts and financial markets. Experiment I elucidates that
subjects value the sunk cost issue higher than other issues in a multi-issue negotiation.
Experiment II illustrates that bidders are influenced by the sunk costs of competing
bidders in a first price, sealed-bid, common-value auction. In financial markets their
exists an analogous concept to sunk cost accounting known as the disposition effect.
This explains the tendency of investors to sell “winning” stocks and hold “losing”
stocks. Experiment III demonstrates that trading strategies in an experimental equity
market are influenced by a pre-trading brokerage cost. Not only are subjects
influenced in the direction that reduces the disposition effect but also trading is
diminished. Without the brokerage cost there was a significant disposition effect.
JEL-Classifications
C70, C90, D44, D80, D81, G1
- …