When markets are well behaved, we expect firms to produce at the point where marginal revenue matches marginal cost. Collusive behavior, on the other hand, arises when firms produce less than this, leading to elevated prices, lower social welfare and higher industry profits.
It is interesting, then, that collusive behavior has been observed between reinforcement learning (RL) agents that act to set prices for goods across repeated interactions in simple, simulated markets. This behavior is the convergence toward a market equilibrium that has lower social welfare or higher industry profits than what is considered a Nash equilibrium for the reinforcement learning agents. In this project, I create a simplified model of an electricity market to confirm the collusive behavior of RL agents, comparing theoretical baselines of profit and welfare to the result of using Q-Learning agents. I then study the effect of various market interventions, in both this simplified model and Abada and Lambin’s model \cite{Abada-Lambin}. The interventions I consider include a) the introduction of a welfare-maximizing agent, b) setting limits on battery and output capacity, c) the use of taxation, and d) a reward-punishment scheme.
In order to assess the suitability of each intervention, a game-theoretic equilibrium is calculated for each intervention and compared to theoretical baselines. This is computed using quadratic program solvers and Scipy optimization packages. The intervention is then implemented in an OpenAI Gym environment to confirm or reject the game-theoretic improvements that were demonstrated. For the welfare-maximizing agent intervention, it was also implemented on the Abada and Lambin model to explore how agents react to the intervention in a more complex environment.
A first result, in both the simplified model as well as Abada and Lambin’s model, is that the introduction of a welfare-maximizing agent fails to provide a desired improvement in social welfare. Likewise, creating restrictions on battery and output capacity fails to provide a desired improvement in social welfare. Rather, I show that a promising direction is to make use of a suitable taxation or reward-punishment scheme, with this able to improve social welfare in both models.Applied Mathematic
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.