In this preliminary (and unpolished) version of the paper, we study an
asynchronous online learning setting with a network of agents. At each time
step, some of the agents are activated, requested to make a prediction, and pay
the corresponding loss. Some feedback is then revealed to these agents and is
later propagated through the network. We consider the case of full, bandit, and
semi-bandit feedback. In particular, we construct a reduction to delayed
single-agent learning that applies to both the full and the bandit feedback
case and allows to obtain regret guarantees for both settings. We complement
these results with a near-matching lower bound