Multi‐state survival models are used to represent the natural history of a disease, forming the basis of a health technology assessment comparing a novel treatment to current practice. Constructing such models for rare diseases is problematic, since evidence sources are typically much sparser and more heterogeneous. This simulation study investigated different one‐stage and two‐stage approaches to meta‐analyzing individual patient data (IPD) in a multi‐state survival setting when the number and size of studies being meta‐analyzed are small. The objective was to assess methods of different complexity to see when they are accurate, when they are inaccurate and when they struggle to converge due to the sparsity of data. Biologically plausible multi‐state IPD were simulated from study‐ and transition‐specific hazard functions. One‐stage frailty and two‐stage stratified models were estimated, and compared to a base case model that did not account for study heterogeneity. Convergence and the bias/coverage of population‐level transition probabilities to, and lengths of stay in, each state were used to assess model performance. A real‐world application to Duchenne Muscular Dystrophy, a neuromuscular rare disease, was conducted, and a software demonstration is provided. Models not accounting for study heterogeneity were consistently out‐performed by two‐stage models. Frailty models struggled to converge, particularly in scenarios of low heterogeneity, and predictions from models that did converge were also subject to bias. Stratified models may be better suited to meta‐analyzing disparate sources of IPD in rare disease natural history/economic modeling, as they converge more consistently and produce less biased predictions of lengths of stay.</p