People have the fascinating ability to infer causality by observing other humans’ actions. We modelled this process using a Bayesian rational agent model and showed how people can reason about another agent’s beliefs and, by extension, infer the world’s causal structure. We compared the model’s predictions against humans’ causal judgements on a novel inference task. Participants (N = 171) were shown a dynamic scene depicting either a human agent, robot agent, or both agents acting on two objects sequentially before observing an outcome. After observing the human (vs the less intentional robot) agent, people were more likely to infer that both objects (in sequential order) caused the outcome. When two agents of different intentionality were shown, people favored the object that the intentional agent interacted with as the cause of the outcome. Our model captured these inference patterns well and revealed insights into reasoning about semi-intentional agents and multi-agent contexts