working paper

Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning

Abstract

We study a sequential decision making problem between a principal and anagent with incomplete information on both sides. In this model, the principaland the agent interact in a stochastic environment, and each is privy toobservations about the state not available to the other. The principal has thepower of commitment, both to elicit information from the agent and to providesignals about her own information. The principal and the agent communicatetheir signals to each other, and select their actions independently based onthis communication. Each player receives a payoff based on the state and theirjoint actions, and the environment moves to a new state. The interactioncontinues over a finite time horizon, and both players act to optimize theirown total payoffs over the horizon. Our model encompasses as special casesstochastic games of incomplete information and POMDPs, as well as sequentialBayesian persuasion and mechanism design problems. We study both computation ofoptimal policies and learning in our setting. While the general problems arecomputationally intractable, we study algorithmic solutions under a conditionalindependence assumption on the underlying state-observation distributions. Wepresent an polynomial-time algorithm to compute the principal's optimal policyup to an additive approximation. Additionally, we show an efficient learningalgorithm in the case where the transition probabilities are not knownbeforehand. The algorithm guarantees sublinear regret for both players.<br

    Similar works