In multi-agent reinforcement learning systems, the actions of one agent can
have a negative impact on the rewards of other agents. One way to combat this
problem is to let agents trade their rewards amongst each other. Motivated by
this, this work applies a trading approach to a simulated scheduling
environment, where the agents are responsible for the assignment of incoming
jobs to compute cores. In this environment, reinforcement learning agents learn
to trade successfully. The agents can trade the usage right of computational
cores to process high-priority, high-reward jobs faster than low-priority,
low-reward jobs. However, due to combinatorial effects, the action and
observation spaces of a simple reinforcement learning agent in this environment
scale exponentially with key parameters of the problem size. However, the
exponential scaling behavior can be transformed into a linear one if the agent
is split into several independent sub-units. We further improve this
distributed architecture using agent-internal parameter sharing. Moreover, it
can be extended to set the exchange prices autonomously. We show that in our
scheduling environment, the advantages of a distributed agent architecture
clearly outweigh more aggregated approaches. We demonstrate that the
distributed agent architecture becomes even more performant using
agent-internal parameter sharing. Finally, we investigate how two different
reward functions affect autonomous pricing and the corresponding scheduling.Comment: Accepted at ABMHuB 2022 worksho