3,424 research outputs found

    On the uses and abuses of regression models: a call for reform of statistical practice and teaching

    Full text link
    When students and users of statistical methods first learn about regression analysis there is an emphasis on the technical details of models and estimation methods that invariably runs ahead of the purposes for which these models might be used. More broadly, statistics is widely understood to provide a body of techniques for "modelling data", underpinned by what we describe as the "true model myth", according to which the task of the statistician/data analyst is to build a model that closely approximates the true data generating process. By way of our own historical examples and a brief review of mainstream clinical research journals, we describe how this perspective leads to a range of problems in the application of regression methods, including misguided "adjustment" for covariates, misinterpretation of regression coefficients and the widespread fitting of regression models without a clear purpose. We then outline an alternative approach to the teaching and application of regression methods, which begins by focussing on clear definition of the substantive research question within one of three distinct types: descriptive, predictive, or causal. The simple univariable regression model may be introduced as a tool for description, while the development and application of multivariable regression models should proceed differently according to the type of question. Regression methods will no doubt remain central to statistical practice as they provide a powerful tool for representing variation in a response or outcome variable as a function of "input" variables, but their conceptualisation and usage should follow from the purpose at hand.Comment: 24 pages main document including 3 figures, plus 15 pages supplementary material. Based on plenary lecture (President's Invited Speaker) delivered to ISCB43, Newcastle, UK, August 2022. Submitted for publication 12-Sep-2

    Neutrophil maturity in cancer

    Get PDF
    Neutrophils are implicated in almost every stage of oncogenesis and paradoxically display anti- and pro-tumor properties. Accumulating evidence indicates that neutrophils display diversity in their phenotype resulting from functional plasticity and/or changes to granulopoiesis. In cancer, neutrophils at a range of maturation stages can be identified in the blood and tissues (i.e., outside of their developmental niche). The functional capacity of neutrophils at different states of maturation is poorly understood resulting from challenges in their isolation, identification, and investigation. Thus, the impact of neutrophil maturity on cancer progression and therapy remains enigmatic. In this review, we discuss the identification, prevalence, and function of immature and mature neutrophils in cancer and the potential impact of this on tumor progression and cancer therapy

    Mediation effects that emulate a target randomised trial:Simulation-based evaluation of ill-defined interventions on multiple mediators

    Get PDF
    Many epidemiological questions concern potential interventions to alter the pathways presumed to mediate an association. For example, we consider a study that investigates the benefit of interventions in young adulthood for ameliorating the poorer mid-life psychosocial outcomes of adolescent self-harmers relative to their healthy peers. Two methodological challenges arise. First, mediation methods have hitherto mostly focused on the elusive task of discovering pathways, rather than on the evaluation of mediator interventions. Second, the complexity of such questions is invariably such that there are no well-defined mediator interventions (i.e. actual treatments, programs, etc.) for which data exist on the relevant populations, outcomes and time-spans of interest. Instead, researchers must rely on exposure (non-intervention) data, that is, on mediator measures such as depression symptoms for which the actual interventions that one might implement to alter them are not well defined. We propose a novel framework that addresses these challenges by defining mediation effects that map to a target trial of hypothetical interventions targeting multiple mediators for which we simulate the effects. Specifically, we specify a target trial addressing three policy-relevant questions, regarding the impacts of hypothetical interventions that would shift the mediators' distributions (separately under various interdependence assumptions, jointly or sequentially) to user-specified distributions that can be emulated with the observed data. We then define novel interventional effects that map to this trial, simulating shifts by setting mediators to random draws from those distributions. We show that estimation using a g-computation method is possible under an expanded set of causal assumptions relative to inference with well-defined interventions, which reflects the lower level of evidence that is expected with ill-defined interventions. Application to the self-harm example in the Victorian Adolescent Health Cohort Study illustrates the value of our proposal for informing the design and evaluation of actual interventions in the future

    Evaluation of a weighting approach for performing sensitivity analysis after multiple imputation.

    Get PDF
    BACKGROUND: Multiple imputation (MI) is a well-recognised statistical technique for handling missing data. As usually implemented in standard statistical software, MI assumes that data are 'Missing at random' (MAR); an assumption that in many settings is implausible. It is not possible to distinguish whether data are MAR or 'Missing not at random' (MNAR) using the observed data, so it is desirable to discover the impact of departures from the MAR assumption on the MI results by conducting sensitivity analyses. A weighting approach based on a selection model has been proposed for performing MNAR analyses to assess the robustness of results obtained under standard MI to departures from MAR. METHODS: In this article, we use simulation to evaluate the weighting approach as a method for exploring possible departures from MAR, with missingness in a single variable, where the parameters of interest are the marginal mean (and probability) of a partially observed outcome variable and a measure of association between the outcome and a fully observed exposure. The simulation studies compare the weighting-based MNAR estimates for various numbers of imputations in small and large samples, for moderate to large magnitudes of departure from MAR, where the degree of departure from MAR was assumed known. Further, we evaluated a proposed graphical method, which uses the dataset with missing data, for obtaining a plausible range of values for the parameter that quantifies the magnitude of departure from MAR. RESULTS: Our simulation studies confirm that the weighting approach outperformed the MAR approach, but it still suffered from bias. In particular, our findings demonstrate that the weighting approach provides biased parameter estimates, even when a large number of imputations is performed. In the examples presented, the graphical approach for selecting a range of values for the possible departures from MAR did not capture the true parameter value of departure used in generating the data. CONCLUSIONS: Overall, the weighting approach is not recommended for sensitivity analyses following MI, and further research is required to develop more appropriate methods to perform such sensitivity analyses

    Comparison of methods for imputing limited-range variables: a simulation study

    Get PDF
    BACKGROUND: Multiple imputation (MI) was developed as a method to enable valid inferences to be obtained in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that imputed values should be plausible for individual observations. One variable type for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables, with a focus on those that restrict the range of the imputed values. METHODS: Using data from a study of adolescent health, we consider three variables based on responses to the General Health Questionnaire (GHQ), a tool for detecting minor psychiatric illness. These variables, based on different scoring methods for the GHQ, resulted in three continuous distributions with mild, moderate and severe positive skewness. In an otherwise complete dataset, we set 33% of the GHQ observations to missing completely at random or missing at random; repeating this process to create 1000 datasets with incomplete data for each scenario. For each dataset, we imputed values on the raw scale and following a zero-skewness log transformation using: univariate regression with no rounding; post-imputation rounding; truncated normal regression; and predictive mean matching. We estimated the marginal mean of the GHQ and the association between the GHQ and a fully observed binary outcome, comparing the results with complete data statistics. RESULTS: Imputation with no rounding performed well when applied to data on the raw scale. Post-imputation rounding and imputation using truncated normal regression produced higher marginal means than the complete data estimate when data had a moderate or severe skew, and this was associated with under-coverage of the complete data estimate. Predictive mean matching also produced under-coverage of the complete data estimate. For the estimate of association, all methods produced similar estimates to the complete data. CONCLUSIONS: For data with a limited range, multiple imputation using techniques that restrict the range of imputed values can result in biased estimates for the marginal mean when data are highly skewed

    The influence of sighing respirations on infant lung function measured using multiple breath washout gas mixing techniques

    Get PDF
    There is substantial interest in studying lung function in infants, to better understand the early life origins of chronic lung diseases such as asthma. Multiple breath washout (MBW) is a technique for measuring lung function that has been adapted for use in infants. Respiratory sighs occur frequently in young infants during natural sleep, and in accordance with current MBW guidelines, result in exclusion of data from a substantial proportion of testing cycles. We assessed how sighs during MBW influenced the measurements obtained using data from 767 tests conducted on 246 infants (50% male; mean age 43 days) as part of a large cohort study. Sighs occurred in 119 (15%) tests. Sighs during the main part of the wash‐in phase (before the last 5 breaths) were not associated with differences in standard MBW measurements compared with tests without sighs. In contrast, sighs that occurred during the washout were associated with a small but discernible increase in magnitude and variability. For example, the mean lung clearance index increased by 0.36 (95% CI: 0.11–0.62) and variance increased by a multiplicative factor of 2 (95% CI: 1.6–2.5). The results suggest it is reasonable to include MBW data from testing cycles where a sigh occurs during the wash‐in phase, but not during washout, of MBW. By recovering data that would otherwise have been excluded, we estimate a boost of about 10% to the final number of acceptable tests and 6% to the number of individuals successfully tested

    A cost-effectiveness analysis of the management of sore throat in children in Australia

    Full text link
    For the first time a cost-effectiveness analysis of the management of sore throat in Australian children has been conducted using accurate epidemiological data generated from recent Australian studies.<br /
    corecore