77 research outputs found

    Contributions aux modèles de régression avec réponses manquantes : risques concurrents et données longitudinales

    Get PDF
    Missing data are a common occurrence in medical studies. In regression modeling, missing outcomes limit our capability to draw inferences about the covariate effects of medical interest, which are those describing the distribution of the entire set of planned outcomes. In addition to losing precision, the validity of any method used to draw inferences from the observed data will require that some assumption about the mechanism leading to missing outcomes holds. Rubin (1976, Biometrika, 63:581-592) called the missingness mechanism MAR (for “missing at random”) if the probability of an outcome being missing does not depend on missing outcomes when conditioning on the observed data, and MNAR (for “missing not at random”) otherwise. This distinction has important implications regarding the modeling requirements to draw valid inferences from the available data, but generally it is not possible to assess from these data whether the missingness mechanism is MAR or MNAR. Hence, sensitivity analyses should be routinely performed to assess the robustness of inferences to assumptions about the missingness mechanism. In the field of incomplete multivariate data, in which the outcomes are gathered in a vector for which some components may be missing, MAR methods are widely available and increasingly used, and several MNAR modeling strategies have also been proposed. On the other hand, although some sensitivity analysis methodology has been developed, this is still an active area of research. The first aim of this dissertation was to develop a sensitivity analysis approach for continuous longitudinal data with drop-outs, that is, continuous outcomes that are ordered in time and completely observed for each individual up to a certain time-point, at which the individual drops-out so that all the subsequent outcomes are missing. The proposed approach consists in assessing the inferences obtained across a family of MNAR pattern-mixture models indexed by a so-called sensitivity parameter that quantifies the departure from MAR. The approach was prompted by a randomized clinical trial investigating the benefits of a treatment for sleep-maintenance insomnia, from which 22% of the individuals had dropped-out before the study end. The second aim was to build on the existing theory for incomplete multivariate data to develop methods for competing risks data with missing causes of failure. The competing risks model is an extension of the standard survival analysis model in which failures from different causes are distinguished. Strategies for modeling competing risks functionals, such as the cause-specific hazards (CSH) and the cumulative incidence function (CIF), generally assume that the cause of failure is known for all patients, but this is not always the case. Some methods for regression with missing causes under the MAR assumption have already been proposed, especially for semi-parametric modeling of the CSH. But other useful models have received little attention, and MNAR modeling and sensitivity analysis approaches have never been considered in this setting. We propose a general framework for semi-parametric regression modeling of the CIF under MAR using inverse probability weighting and multiple imputation ideas. Also under MAR, we propose a direct likelihood approach for parametric regression modeling of the CSH and the CIF. Furthermore, we consider MNAR pattern-mixture models in the context of sensitivity analyses. In the competing risks literature, a starting point for methodological developments for handling missing causes was a stage II breast cancer randomized clinical trial in which 23% of the deceased women had missing cause of death. We use these data to illustrate the practical value of the proposed approaches.Les données manquantes sont fréquentes dans les études médicales. Dans les modèles de régression, les réponses manquantes limitent notre capacité à faire des inférences sur les effets des covariables décrivant la distribution de la totalité des réponses prévues sur laquelle porte l'intérêt médical. Outre la perte de précision, toute inférence statistique requière qu'une hypothèse sur le mécanisme de manquement soit vérifiée. Rubin (1976, Biometrika, 63:581-592) a appelé le mécanisme de manquement MAR (pour les sigles en anglais de « manquant au hasard ») si la probabilité qu'une réponse soit manquante ne dépend pas des réponses manquantes conditionnellement aux données observées, et MNAR (pour les sigles en anglais de « manquant non au hasard ») autrement. Cette distinction a des implications importantes pour la modélisation, mais en général il n'est pas possible de déterminer si le mécanisme de manquement est MAR ou MNAR à partir des données disponibles. Par conséquent, il est indispensable d'effectuer des analyses de sensibilité pour évaluer la robustesse des inférences aux hypothèses de manquement.Pour les données multivariées incomplètes, c'est-à-dire, lorsque l'intérêt porte sur un vecteur de réponses dont certaines composantes peuvent être manquantes, plusieurs méthodes de modélisation sous l'hypothèse MAR et, dans une moindre mesure, sous l'hypothèse MNAR ont été proposées. En revanche, le développement de méthodes pour effectuer des analyses de sensibilité est un domaine actif de recherche. Le premier objectif de cette thèse était de développer une méthode d'analyse de sensibilité pour les données longitudinales continues avec des sorties d'étude, c'est-à-dire, pour les réponses continues, ordonnées dans le temps, qui sont complètement observées pour chaque individu jusqu'à la fin de l'étude ou jusqu'à ce qu'il sorte définitivement de l'étude. Dans l'approche proposée, on évalue les inférences obtenues à partir d'une famille de modèles MNAR dits « de mélange de profils », indexés par un paramètre qui quantifie le départ par rapport à l'hypothèse MAR. La méthode a été motivée par un essai clinique étudiant un traitement pour le trouble du maintien du sommeil, durant lequel 22% des individus sont sortis de l'étude avant la fin.Le second objectif était de développer des méthodes pour la modélisation de risques concurrents avec des causes d'évènement manquantes en s'appuyant sur la théorie existante pour les données multivariées incomplètes. Les risques concurrents apparaissent comme une extension du modèle standard de l'analyse de survie où l'on distingue le type d'évènement ou la cause l'ayant entrainé. Les méthodes pour modéliser le risque cause-spécifique et la fonction d'incidence cumulée supposent en général que la cause d'évènement est connue pour tous les individus, ce qui n'est pas toujours le cas. Certains auteurs ont proposé des méthodes de régression gérant les causes manquantes sous l'hypothèse MAR, notamment pour la modélisation semi-paramétrique du risque. Mais d'autres modèles n'ont pas été considérés, de même que la modélisation sous MNAR et les analyses de sensibilité. Nous proposons des estimateurs pondérés et une approche par imputation multiple pour la modélisation semi-paramétrique de l'incidence cumulée sous l'hypothèse MAR. En outre, nous étudions une approche par maximum de vraisemblance pour la modélisation paramétrique du risque et de l'incidence sous MAR. Enfin, nous considérons des modèles de mélange de profils dans le contexte des analyses de sensibilité. Un essai clinique étudiant un traitement pour le cancer du sein de stade II avec 23% des causes de décès manquantes sert à illustrer les méthodes proposées

    On the uses and abuses of regression models: a call for reform of statistical practice and teaching

    Full text link
    When students and users of statistical methods first learn about regression analysis there is an emphasis on the technical details of models and estimation methods that invariably runs ahead of the purposes for which these models might be used. More broadly, statistics is widely understood to provide a body of techniques for "modelling data", underpinned by what we describe as the "true model myth", according to which the task of the statistician/data analyst is to build a model that closely approximates the true data generating process. By way of our own historical examples and a brief review of mainstream clinical research journals, we describe how this perspective leads to a range of problems in the application of regression methods, including misguided "adjustment" for covariates, misinterpretation of regression coefficients and the widespread fitting of regression models without a clear purpose. We then outline an alternative approach to the teaching and application of regression methods, which begins by focussing on clear definition of the substantive research question within one of three distinct types: descriptive, predictive, or causal. The simple univariable regression model may be introduced as a tool for description, while the development and application of multivariable regression models should proceed differently according to the type of question. Regression methods will no doubt remain central to statistical practice as they provide a powerful tool for representing variation in a response or outcome variable as a function of "input" variables, but their conceptualisation and usage should follow from the purpose at hand.Comment: 24 pages main document including 3 figures, plus 15 pages supplementary material. Based on plenary lecture (President's Invited Speaker) delivered to ISCB43, Newcastle, UK, August 2022. Submitted for publication 12-Sep-2

    Confounding-adjustment methods for the causal difference in medians

    Get PDF
    Background With continuous outcomes, the average causal effect is typically defined using a contrast of expected potential outcomes. However, in the presence of skewed outcome data, the expectation (population mean) may no longer be meaningful. In practice the typical approach is to continue defining the estimand this way or transform the outcome to obtain a more symmetric distribution, although neither approach may be entirely satisfactory. Alternatively the causal effect can be redefined as a contrast of median potential outcomes, yet discussion of confounding-adjustment methods to estimate the causal difference in medians is limited. In this study we described and compared confounding-adjustment methods to address this gap. Methods The methods considered were multivariable quantile regression, an inverse probability weighted (IPW) estimator, weighted quantile regression (another form of IPW) and two little-known implementations of g-computation for this problem. Methods were evaluated within a simulation study under varying degrees of skewness in the outcome and applied to an empirical study using data from the Longitudinal Study of Australian Children. Results Simulation results indicated the IPW estimator, weighted quantile regression and g-computation implementations minimised bias across all settings when the relevant models were correctly specified, with g-computation additionally minimising the variance. Multivariable quantile regression, which relies on a constant-effect assumption, consistently yielded biased results. Application to the empirical study illustrated the practical value of these methods. Conclusion The presented methods provide appealing avenues for estimating the causal difference in medians.Peer reviewe

    Mediation effects that emulate a target randomised trial:Simulation-based evaluation of ill-defined interventions on multiple mediators

    Get PDF
    Many epidemiological questions concern potential interventions to alter the pathways presumed to mediate an association. For example, we consider a study that investigates the benefit of interventions in young adulthood for ameliorating the poorer mid-life psychosocial outcomes of adolescent self-harmers relative to their healthy peers. Two methodological challenges arise. First, mediation methods have hitherto mostly focused on the elusive task of discovering pathways, rather than on the evaluation of mediator interventions. Second, the complexity of such questions is invariably such that there are no well-defined mediator interventions (i.e. actual treatments, programs, etc.) for which data exist on the relevant populations, outcomes and time-spans of interest. Instead, researchers must rely on exposure (non-intervention) data, that is, on mediator measures such as depression symptoms for which the actual interventions that one might implement to alter them are not well defined. We propose a novel framework that addresses these challenges by defining mediation effects that map to a target trial of hypothetical interventions targeting multiple mediators for which we simulate the effects. Specifically, we specify a target trial addressing three policy-relevant questions, regarding the impacts of hypothetical interventions that would shift the mediators' distributions (separately under various interdependence assumptions, jointly or sequentially) to user-specified distributions that can be emulated with the observed data. We then define novel interventional effects that map to this trial, simulating shifts by setting mediators to random draws from those distributions. We show that estimation using a g-computation method is possible under an expanded set of causal assumptions relative to inference with well-defined interventions, which reflects the lower level of evidence that is expected with ill-defined interventions. Application to the self-harm example in the Victorian Adolescent Health Cohort Study illustrates the value of our proposal for informing the design and evaluation of actual interventions in the future

    Causal inference in multi-cohort studies using the target trial approach

    Full text link
    Longitudinal cohort studies provide the opportunity to examine causal effects of complex exposures on long-term health outcomes. Utilizing data from multiple cohorts has the potential to add further benefit by improving precision of estimates through data pooling and allowing examination of effect heterogeneity across contexts. However, the interpretation of findings can be complicated by biases that may be compounded when pooling data or may contribute to discrepant findings when analyses are replicated across cohorts. Here we extend the target trial framework, already well established as a powerful tool for causal inference in single-cohort studies, to address the specific challenges that can arise in the multi-cohort setting. The approach considers the target trial as a central point of reference, as opposed to comparing one study to another. This enables clear definition of the target estimand and systematic consideration of sources of bias within each cohort and additional sources of bias arising from data pooling. Consequently, analyses can be designed to reduce these biases and the resulting findings appropriately interpreted. We use a case study to demonstrate the approach and its potential to strengthen causal inference in multi-cohort studies through improved analysis design and clarity in the interpretation of findings.Comment: 34 pages, 3 figure

    Electronic media use and academic performance in late childhood: A longitudinal study

    Get PDF
    Introduction The effects of electronic media use on health has received much attention but less is known about links with academic performance. This study prospectively examines the effect of media use on academic performance in late childhood. Materials and methods 1239 8- to 9-year-olds and their parents were recruited to take part in a prospective, longitudinal study. Academic performance was measured on a national achievement test at baseline and 10–11 years of age. Parents reported on their child’s duration of electronic media use. Results After control for baseline reading, watching more than two hours of television per day at 8–9 years of age predicted a 12-point lower performance in reading at 10–11 years, equivalent to the loss of a third of a year in learning. Using a computer for more than one hour a day predicted a similar 12-point lower numeracy performance. Regarding cross-sectional associations (presumed to capture short-term effects) of media use on numeracy, after controlling for prior media exposure, watching more than two hours of television per day at 10–11 years was concurrently associated with a 12-point lower numeracy score and using a computer for more than one hour per day with a 13-point lower numeracy performance. There was little evidence for concurrent effects on reading. There was no evidence of short- or long-term associations between videogame use and academic performance. Discussion Cumulative television use is associated with poor reading and cumulative computer use with poorer numeracy. Beyond any links between heavy media use and health risks such as obesity, physical activity and mental health, these findings raise a possibility of additional risks of both television and computer use for learning in mid-childhood. These findings carry implications for parents, teachers and clinicians to consider the type and timing of media exposure in developing media plans for children.Peer reviewe

    Learning outcomes in primary school children with emotional problems: a prospective cohort study

    Get PDF
    BACKGROUND: Academic difficulties are common in adolescents with mental health problems. Although earlier childhood emotional problems, characterised by heightened anxiety and depressive symptoms are common forerunners to adolescent mental health problems, the degree to which mental health problems in childhood may contribute independently to academic difficulties has been little explored. METHODS: Data were drawn from a prospective cohort study of students in Melbourne, Australia (N = 1239). Data were linked with a standardised national assessment of academic performance at baseline (9 years) and wave three (11 years). Depressive and anxiety symptoms were assessed at baseline and wave two (10 years). Regression analyses estimated the association between emotional problems (9 and/or 10 years) and academic performance at 11 years, adjusting for baseline academic performance, sex, age and socioeconomic status, and hyperactivity/inattention symptoms. RESULTS: Students with depressive symptoms at 9 years of age had lost nearly 4 months of numeracy learning two years later after controlling for baseline academic performance and confounders. Results were similar for anxiety symptoms. Regardless of when depressive symptoms occurred there were consistent associations with poorer numeracy performance at 11 years. The association of depressive symptoms with reading performance was weaker than for numeracy if they were present at wave two. Persistent anxiety symptoms across two waves led to nearly a 4 month loss of numeracy learning at 11 years, but the difference was not meaningful for reading. Findings were similar when including hyperactivity/inattention symptoms. CONCLUSIONS: Childhood anxiety and depression are not only forerunners of later mental health problems but predict academic achievement. Partnerships between education and health systems have the potential to not only improve childhood emotional problems but also improve learning
    • …
    corecore