182 research outputs found

    Augmenting the logrank test in the design of clinical trials in which non-proportional hazards of the treatment effect may be anticipated

    Get PDF
    © 2016 Royston and Parmar. Background: Most randomized controlled trials with a time-to-event outcome are designed assuming proportional hazards (PH) of the treatment effect. The sample size calculation is based on a logrank test. However, non-proportional hazards are increasingly common. At analysis, the estimated hazards ratio with a confidence interval is usually presented. The estimate is often obtained from a Cox PH model with treatment as a covariate. If non-proportional hazards are present, the logrank and equivalent Cox tests may lose power. To safeguard power, we previously suggested a 'joint test' combining the Cox test with a test of non-proportional hazards. Unfortunately, a larger sample size is needed to preserve power under PH. Here, we describe a novel test that unites the Cox test with a permutation test based on restricted mean survival time. Methods: We propose a combined hypothesis test based on a permutation test of the difference in restricted mean survival time across time. The test involves the minimum of the Cox and permutation test P-values. We approximate its null distribution and correct it for correlation between the two P-values. Using extensive simulations, we assess the type 1 error and power of the combined test under several scenarios and compare with other tests. We investigate powering a trial using the combined test. Results: The type 1 error of the combined test is close to nominal. Power under proportional hazards is slightly lower than for the Cox test. Enhanced power is available when the treatment difference shows an 'early effect', an initial separation of survival curves which diminishes over time. The power is reduced under a 'late effect', when little or no difference in survival curves is seen for an initial period and then a late separation occurs. We propose a method of powering a trial using the combined test. The 'insurance premium' offered by the combined test to safeguard power under non-PH represents about a single-digit percentage increase in sample size. Conclusions: The combined test increases trial power under an early treatment effect and protects power under other scenarios. Use of restricted mean survival time facilitates testing and displaying a generalized treatment effect

    Incorporating Biomarker Stratification into STAMPEDE: an Adaptive Multi-arm, Multi-stage Trial Platform

    Get PDF
    The treatment and outcomes for advanced prostate cancer have experienced significant progress over recent years. Importantly, the additional benefits of 'up front' chemotherapy (docetaxel) and abiraterone, over and above conventional androgen deprivation, have been separately demonstrated in the multi-arm, multi-stage (MAMS) STAMPEDE protocol, which continues recruitment to other questions. Alongside this, insights into the underlying molecular biology and, inevitably, the molecular heterogeneity of prostate cancer are opening the door to new therapeutic approaches. Incorporating this understanding and testing these hypotheses within STAMPEDE brings new challenges to the MAMS approach, but has the potential to further improve the outlook for this disease

    The DURATIONS randomised trial design: Estimation targets, analysis methods and operating characteristics

    Get PDF
    Background. Designing trials to reduce treatment duration is important in several therapeutic areas, including TB and antibiotics. We recently proposed a new randomised trial design to overcome some of the limitations of standard two-arm non-inferiority trials. This DURATIONS design involves randomising patients to a number of duration arms, and modelling the so-called duration-response curve. This article investigates the operating characteristics (type-1 and type-2 errors) of different statistical methods of drawing inference from the estimated curve. Methods. Our first estimation target is the shortest duration non-inferior to the control (maximum) duration within a specific risk difference margin. We compare different methods of estimating this quantity, including using model confidence bands, the delta method and bootstrap. We then explore the generalisability of results to estimation targets which focus on absolute event rates, risk ratio and gradient of the curve. Results. We show through simulations that, in most scenarios and for most of the estimation targets, using the bootstrap to estimate variability around the target duration leads to good results for DURATIONS design-appropriate quantities analogous to power and type-1 error. Using model confidence bands is not recommended, while the delta method leads to inflated type-1 error in some scenarios, particularly when the optimal duration is very close to one of the randomised durations. Conclusions. Using the bootstrap to estimate the optimal duration in a DURATIONS design has good operating characteristics in a wide range of scenarios, and can be used with confidence by researchers wishing to design a DURATIONS trial to reduce treatment duration. Uncertainty around several different targets can be estimated with this bootstrap approach

    Radiotherapy for metastatic prostate cancer – Authors' reply

    Get PDF

    The extension of total gain (TG) statistic in survival models: Properties and applications

    Get PDF
    Background: The results of multivariable regression models are usually summarized in the form of parameter estimates for the covariates, goodness-of-fit statistics, and the relevant p-values. These statistics do not inform us about whether covariate information will lead to any substantial improvement in prediction. Predictive ability measures can be used for this purpose since they provide important information about the practical significance of prognostic factors. R 2 -type indices are the most familiar forms of such measures in survival models, but they all have limitations and none is widely used. Methods: In this paper, we extend the total gain (TG) measure, proposed for a logistic regression model, to survival models and explore its properties using simulations and real data. TG is based on the binary regression quantile plot, otherwise known as the predictiveness curve. Standardised TG ranges from 0 (no explanatory power) to 1 ('perfect' explanatory power). Results: The results of our simulations show that unlike many of the other R 2 -type predictive ability measures, TG is independent of random censoring. It increases as the effect of a covariate increases and can be applied to different types of survival models, including models with time-dependent covariate effects. We also apply TG to quantify the predictive ability of multivariable prognostic models developed in several disease areas. Conclusions: Overall, TG performs well in our simulation studies and can be recommended as a measure to quantify the predictive ability in survival models

    Timely and reliable evaluation of the effects of interventions: a framework for adaptive meta-analysis (FAME)

    Get PDF
    Most systematic reviews are retrospective and use aggregate data AD) from publications, meaning they can be unreliable, lag behind therapeutic developments and fail to influence ongoing or new trials. Commonly, the potential influence of unpublished or ongoing trials is overlooked when interpreting results, or determining the value of updating the meta-analysis or need to collect individual participant data (IPD). Therefore, we developed a Framework for Adaptive Metaanalysis (FAME) to determine prospectively the earliest opportunity for reliable AD meta-analysis. We illustrate FAME using two systematic reviews in men with metastatic (M1) and non-metastatic (M0)hormone-sensitive prostate cancer (HSPC)

    Effectiveness and acceptability of methods of communicating the results of clinical research to lay and professional audiences: protocol for a systematic review

    Get PDF
    BACKGROUND: Phase III randomised controlled trials aim not just to increase the sum of human knowledge, but also to improve treatment, care or prevention for future patients through changing policy and practice. To achieve this, the results need to be communicated effectively to several audiences. It is unclear how best to do this while not wasting scarce resources or causing avoidable distress or confusion. The aim of this systematic review is to examine the effectiveness, acceptability and resource implications of different methods of communication of clinical research results to lay or professional audiences, to inform practice. // METHODS: We will systematically review the published literature from 2000 to 2018 for reports of approaches for communicating clinical study results to lay audiences (patients, participants, carers and the wider public) or professional audiences (clinicians, policymakers, guideline developers, other medical professionals). We will search Embase, MEDLINE, PsycINFO, ASSIA, the Cochrane Database of Systematic Reviews and grey literature sources. One reviewer will screen titles and abstracts for potential eligibility, discarding only those that are clearly irrelevant. Potentially relevant full texts will then be assessed for inclusion by two reviewers. Data extraction will be carried out by one reviewer using EPPI-Reviewer. Risk of bias will be assessed using the relevant Cochrane Risk of Bias 2.0 tool, ROBINS-1, AXIS Appraisal Tool or Critical Appraisals Skills Programme Qualitative Checklist, depending on study design. We will decide whether to meta-analyse data based on whether the included trials are similar enough in terms of participants, settings, intervention, comparison and outcome measures to allow meaningful conclusions from a statistically pooled result. We will present the data in tables and narratively summarise the results. We will use thematic synthesis for qualitative studies. // DISCUSSION: Developing the search strategy for this review has been challenging as many of the concepts (patients, clinicians, clinical studies, and communication) are widely used in literature that is not relevant for inclusion in our review. We expect there will be limited comparative evidence, spread over a wide range of approaches, comparators and populations and, therefore, do not anticipate being able to carry out meta-analysis
    • …
    corecore