7 research outputs found
An approach to trial design and analysis in the era of non-proportional hazards of the treatment effect
Background: Most randomized controlled trials with a time-to-event outcome are designed and analysed under the proportional hazards assumption, with a target hazard ratio for the treatment effect in mind. However, the hazards may be non-proportional. We address how to design a trial under such conditions, and how to analyse the results. Methods: We propose to extend the usual approach, a logrank test, to also include the Grambsch-Therneau test of proportional hazards. We test the resulting composite null hypothesis using a joint test for the hazard ratio and for time-dependent behaviour of the hazard ratio. We compute the power and sample size for the logrank test under proportional hazards, and from that we compute the power of the joint test. For the estimation of relevant quantities from the trial data, various models could be used; we advocate adopting a pre-specified flexible parametric survival model that supports time-dependent behaviour of the hazard ratio. Results: We present the mathematics for calculating the power and sample size for the joint test. We illustrate the methodology in real data from two randomized trials, one in ovarian cancer and the other in treating cellulitis. We show selected estimates and their uncertainty derived from the advocated flexible parametric model. We demonstrate in a small simulation study that when a treatment effect either increases or decreases over time, the joint test can outperform the logrank test in the presence of both patterns of non-proportional hazards. Conclusions: Those designing and analysing trials in the era of non-proportional hazards need to acknowledge that a more complex type of treatment effect is becoming more common. Our method for the design of the trial retains the tools familiar in the standard methodology based on the logrank test, and extends it to incorporate a joint test of the null hypothesis with power against non-proportional hazards. For the analysis of trial data, we propose the use of a pre-specified flexible parametric model that can represent a time-dependent hazard ratio if one is present
Designs for clinical trials with time-to-event outcomes based on stopping guidelines for lack of benefit
<p>Abstract</p> <p>background</p> <p>The pace of novel medical treatments and approaches to therapy has accelerated in recent years. Unfortunately, many potential therapeutic advances do not fulfil their promise when subjected to randomized controlled trials. It is therefore highly desirable to speed up the process of evaluating new treatment options, particularly in phase II and phase III trials. To help realize such an aim, in 2003, Royston and colleagues proposed a class of multi-arm, two-stage trial designs intended to eliminate poorly performing contenders at a first stage (point in time). Only treatments showing a predefined degree of advantage against a control treatment were allowed through to a second stage. Arms that survived the first-stage comparison on an intermediate outcome measure entered a second stage of patient accrual, culminating in comparisons against control on the definitive outcome measure. The intermediate outcome is typically on the causal pathway to the definitive outcome (i.e. the features that cause an intermediate event also tend to cause a definitive event), an example in cancer being progression-free and overall survival. Although the 2003 paper alluded to multi-arm trials, most of the essential design features concerned only two-arm trials. Here, we extend the two-arm designs to allow an arbitrary number of stages, thereby increasing flexibility by building in several 'looks' at the accumulating data. Such trials can terminate at any of the intermediate stages or the final stage.</p> <p>Methods</p> <p>We describe the trial design and the mathematics required to obtain the timing of the 'looks' and the overall significance level and power of the design. We support our results by extensive simulation studies. As an example, we discuss the design of the STAMPEDE trial in prostate cancer.</p> <p>Results</p> <p>The mathematical results on significance level and power are confirmed by the computer simulations. Our approach compares favourably with methodology based on beta spending functions and on monitoring only a primary outcome measure for lack of benefit of the new treatment.</p> <p>Conclusions</p> <p>The new designs are practical and are supported by theory. They hold considerable promise for speeding up the evaluation of new treatments in phase II and III trials.</p
Type I error rates of multi-arm multi-stage clinical trials: strong control and impact of intermediate outcomes
BACKGROUND: The multi-arm multi-stage (MAMS) design described by Royston et al. [Stat Med. 2003;22(14):2239-56 and Trials. 2011;12:81] can accelerate treatment evaluation by comparing multiple treatments with a control in a single trial and stopping recruitment to arms not showing sufficient promise during the course of the study. To increase efficiency further, interim assessments can be based on an intermediate outcome (I) that is observed earlier than the definitive outcome (D) of the study. Two measures of type I error rate are often of interest in a MAMS trial. Pairwise type I error rate (PWER) is the probability of recommending an ineffective treatment at the end of the study regardless of other experimental arms in the trial. Familywise type I error rate (FWER) is the probability of recommending at least one ineffective treatment and is often of greater interest in a study with more than one experimental arm. METHODS: We demonstrate how to calculate the PWER and FWER when the I and D outcomes in a MAMS design differ. We explore how each measure varies with respect to the underlying treatment effect on I and show how to control the type I error rate under any scenario. We conclude by applying the methods to estimate the maximum type I error rate of an ongoing MAMS study and show how the design might have looked had it controlled the FWER under any scenario. RESULTS: The PWER and FWER converge to their maximum values as the effectiveness of the experimental arms on I increases. We show that both measures can be controlled under any scenario by setting the pairwise significance level in the final stage of the study to the target level. In an example, controlling the FWER is shown to increase considerably the size of the trial although it remains substantially more efficient than evaluating each new treatment in separate trials. CONCLUSIONS: The proposed methods allow the PWER and FWER to be controlled in various MAMS designs, potentially increasing the uptake of the MAMS design in practice. The methods are also applicable in cases where the I and D outcomes are identical