8 research outputs found

    Impact of methodological choices in comparative effectiveness studies: application in natalizumab versus fingolimod comparison among patients with multiple sclerosis

    No full text
    Abstract Background Natalizumab and fingolimod are used as high-efficacy treatments in relapsing–remitting multiple sclerosis. Several observational studies comparing these two drugs have shown variable results, using different methods to control treatment indication bias and manage censoring. The objective of this empirical study was to elucidate the impact of methods of causal inference on the results of comparative effectiveness studies. Methods Data from three observational multiple sclerosis registries (MSBase, the Danish MS Registry and French OFSEP registry) were combined. Four clinical outcomes were studied. Propensity scores were used to match or weigh the compared groups, allowing for estimating average treatment effect for treated or average treatment effect for the entire population. Analyses were conducted both in intention-to-treat and per-protocol frameworks. The impact of the positivity assumption was also assessed. Results Overall, 5,148 relapsing–remitting multiple sclerosis patients were included. In this well-powered sample, the 95% confidence intervals of the estimates overlapped widely. Propensity scores weighting and propensity scores matching procedures led to consistent results. Some differences were observed between average treatment effect for the entire population and average treatment effect for treated estimates. Intention-to-treat analyses were more conservative than per-protocol analyses. The most pronounced irregularities in outcomes and propensity scores were introduced by violation of the positivity assumption. Conclusions This applied study elucidates the influence of methodological decisions on the results of comparative effectiveness studies of treatments for multiple sclerosis. According to our results, there are no material differences between conclusions obtained with propensity scores matching or propensity scores weighting given that a study is sufficiently powered, models are correctly specified and positivity assumption is fulfilled

    Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

    No full text
    We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance (p < .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion (p < .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely high-powered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (< 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied
    corecore