40 research outputs found

    Competing risks analyses: objectives and approaches

    Get PDF
    Studies in cardiology often record the time to multiple disease events such as death, myocardial infarction, or hospitalization. Competing risks methods allow for the analysis of the time to the first observed event and the type of the first event. They are also relevant if the time to a specific event is of primary interest but competing events may preclude its occurrence or greatly alter the chances to observe it. We give a non-technical overview of competing risks concepts for descriptive and regression analyses. For descriptive statistics, the cumulative incidence function is the most important tool. For regression modelling, we introduce regression models for the cumulative incidence function and the cause-specific hazard function, respectively. We stress the importance of choosing statistical methods that are appropriate if competing risks are present. We also clarify the role of competing risks for the analysis of composite endpoint

    Impact of Model Choice When Studying the Relationship Between Blood Pressure Variability and Risk of Stroke Recurrence

    Get PDF
    Long-term blood pressure variability (BPV), an increasingly recognized vascular risk factor, is challenging to analyze. The objective was to assess the impact of BPV modeling on its estimated effect on the risk of stroke. We used data from a secondary stroke prevention trial, PROGRESS (Perindopril Protection Against Stroke Study), which included 6105 subjects. The median number of blood pressure (BP) measurements was 12 per patient and 727 patients experienced a first stroke recurrence over a mean follow-up of 4.3 years. Hazard ratios (HRs) of BPV were estimated from 6 proportional hazards models using different BPV modeling for comparison purposes. The 3 commonly used methods first derived SD of BP measures observed over a given period of follow-up and then used it as a fixed covariate in a Cox model. The 3 more advanced modeling accounted for changes in BP or BPV over time in a single-stage analysis. While the 3 commonly used methods produced contradictory results (for a 5 mmHg increase in BPV, HR=0.75 [95% CI, 0.68–0.82], HR=0.99 [0.91–1.08], HR=1.19 [1.10–1.30]), the 3 more advanced modeling resulted in a similar moderate positive association (HR=1.08 [95% CI, 0.99–1.17]), whether adjusted for BP at randomization or mean BP over the follow-up. The method used to assess BPV strongly affects its estimated effect on the risk of stroke, and should be chosen with caution. Further methodological developments are needed to account for the dynamics of both BP and BPV over time, to clarify the specific role of BPV

    JMIR Res Protoc

    Get PDF
    Background: Breast cancer is the most frequent cancer in women in industrialized countries. Lifestyle and environmental factors, particularly endocrine-disrupting pollutants, have been suggested to play a role in breast cancer risk. Current epidemiological studies, although not fully consistent, suggest a positive association of breast cancer risk with exposure to several International Agency for Research on Cancer Group 1 air-pollutant carcinogens, such as particulate matter, polychlorinated biphenyls (PCB), dioxins, Benzo[a]pyrene (BaP), and cadmium. However, epidemiological studies remain scarce and inconsistent. It has been proposed that the menopausal status could modify the relationship between pollutants and breast cancer and that the association varies with hormone receptor status. Objective: The XENAIR project will investigate the association of breast cancer risk (overall and by hormone receptor status) with chronic exposure to selected air pollutants, including particulate matter, nitrogen dioxide (NO2), ozone (O3), BaP, dioxins, PCB-153, and cadmium. Methods: Our research is based on a case-control study nested within the French national E3N cohort of 5222 invasive breast cancer cases identified during follow-up from 1990 to 2011, and 5222 matched controls. A questionnaire was sent to all participants to collect their lifetime residential addresses and information on indoor pollution. We will assess these exposures using complementary models of land-use regression, atmospheric dispersion, and regional chemistry-transport (CHIMERE) models, via a Geographic Information System. Associations with breast cancer risk will be modeled using conditional logistic regression models. We will also study the impact of exposure on DNA methylation and interactions with genetic polymorphisms. Appropriate statistical methods, including Bayesian modeling, principal component analysis, and cluster analysis, will be used to assess the impact of multipollutant exposure. The fraction of breast cancer cases attributable to air pollution will be estimated. Results: The XENAIR project will contribute to current knowledge on the health effects of air pollution and identify and understand environmental modifiable risk factors related to breast cancer risk. Conclusions: The results will provide relevant evidence to governments and policy-makers to improve effective public health prevention strategies on air pollution. The XENAIR dataset can be used in future efforts to study the effects of exposure to air pollution associated with other chronic conditions

    Validity of Internet-Based Longitudinal Study Data:The elephant in the virtual room

    Get PDF
    Background: Internet-based data collection relies on well-designed and validated questionnaires. The theory behind designing and validating questionnaires is well described, but few practical examples of how to approach validation are available in the literature.Objective: We aimed to validate data collected in an ongoing Internet-based longitudinal health study through direct visits to participants and recall of their health records. We demonstrate that despite extensive pre-planning, social desirability can still affect data in unexpected ways and that anticipation of poor quality data may be confounded by positive validation.Methods: Dogslife is a large-scale, Web-based longitudinal study of canine health, in which owners of Labrador Retrievers were recruited and questioned at regular intervals about the lifestyle and health of their dogs using an Internet-based questionnaire. The Dogslife questionnaire predominantly consists of closed-answer questions. In our work, two separate validation methodologies were used: (1) direct interviews with 43 participants during visits to their households and (2) comparison of owner-entered health reports with 139 historical health records.Results: Our results indicate that user-derived measures should not be regarded as a single category; instead, each measurement should be considered separately as each presents its own challenge to participants. We recommend trying to ascertain the extent of recall decay within a study and, if necessary, using this to guide data collection timepoints and analyses. Finally, we recommend that multiple methods of communication facilitate validation studies and aid cohort engagement.Conclusions: Our study highlighted how the theory underpinning online questionnaire design and validation translates into practical data issues when applied to Internet-based studies. Validation should be regarded as an extension of questionnaire design, and that validation work should commence as soon as sufficient data are available. We believe that validation is a crucial step and hope our suggested guidelines will help facilitate validation of other Internet-based cohort studies

    PLoS One

    Get PDF
    Quantifying the association between lifetime exposures and the risk of developing a chronic disease is a recurrent challenge in epidemiology. Individual exposure trajectories are often heterogeneous and studying their associations with the risk of disease is not straightforward. We propose to use a latent class mixed model (LCMM) to identify profiles (latent classes) of exposure trajectories and estimate their association with the risk of disease. The methodology is applied to study the association between lifetime trajectories of smoking or occupational exposure to asbestos and the risk of lung cancer in males of the ICARE population-based case-control study. Asbestos exposure was assessed using a job exposure matrix. The classes of exposure trajectories were identified using two separate LCMM for smoking and asbestos, and the association between the identified classes and the risk of lung cancer was estimated in a second stage using weighted logistic regression and all subjects. A total of 2026/2610 cases/controls had complete information on both smoking and asbestos exposure, including 1938/1837 cases/controls ever smokers, and 1417/1520 cases/controls ever exposed to asbestos. The LCMM identified four latent classes of smoking trajectories which had different risks of lung cancer, all much stronger than never smokers. The most frequent class had moderate constant intensity over lifetime while the three others had either long-term, distant or recent high intensity. The latter had the strongest risk of lung cancer. We identified five classes of asbestos exposure trajectories which all had higher risk of lung cancer compared to men never occupationally exposed to asbestos, whatever the dose and the timing of exposure. The proposed approach opens new perspectives for the analyses of dose-time-response relationships between protracted exposures and the risk of developing a chronic disease, by providing a complete picture of exposure history in terms of intensity, duration, and timing of exposure

    Augmented backward elimination: a pragmatic and purposeful way to develop statistical models.

    No full text
    Statistical models are simple mathematical rules derived from empirical data describing the association between an outcome and several explanatory variables. In a typical modeling situation statistical analysis often involves a large number of potential explanatory variables and frequently only partial subject-matter knowledge is available. Therefore, selecting the most suitable variables for a model in an objective and practical manner is usually a non-trivial task. We briefly revisit the purposeful variable selection procedure suggested by Hosmer and Lemeshow which combines significance and change-in-estimate criteria for variable selection and critically discuss the change-in-estimate criterion. We show that using a significance-based threshold for the change-in-estimate criterion reduces to a simple significance-based selection of variables, as if the change-in-estimate criterion is not considered at all. Various extensions to the purposeful variable selection procedure are suggested. We propose to use backward elimination augmented with a standardized change-in-estimate criterion on the quantity of interest usually reported and interpreted in a model for variable selection. Augmented backward elimination has been implemented in a SAS macro for linear, logistic and Cox proportional hazards regression. The algorithm and its implementation were evaluated by means of a simulation study. Augmented backward elimination tends to select larger models than backward elimination and approximates the unselected model up to negligible differences in point estimates of the regression coefficients. On average, regression coefficients obtained after applying augmented backward elimination were less biased relative to the coefficients of correctly specified models than after backward elimination. In summary, we propose augmented backward elimination as a reproducible variable selection algorithm that gives the analyst more flexibility in adopting model selection to a specific statistical modeling situation

    Modeling smoking history: a comparison of different approaches.

    No full text
    The impact of cigarette smoking on various diseases is studied frequently in epidemiology. However, there is no consensus on how to model different aspects of smoking history. The aim of this investigation was to elucidate the impact of several decisions that must be made when modeling smoking variables. The authors used data on lung cancer from a case-control study undertaken in Montreal, Quebec, Canada, in 1979-1985. The roles of smoking status, intensity, duration, cigarette-years, age at initiation, and time since cessation were investigated using time-dependent variables in an adaptation of Cox's model to case-control data. The authors reached four conclusions. 1) The estimated hazard ratios for current and ex-smokers depend strongly on how long subjects are required to not have smoked to be considered "ex-smokers." 2) When the aim is to estimate the effect of continuous smoking variables, a simple approach can be used (and is proposed) to separate the qualitative difference between never and ever smokers from the quantitative effect of smoking. 3) Using intensity and duration as separate variables may lead to a better model fit than using their product (cigarette-years). 4) When estimating the effects of time since cessation or age at initiation, it is still useful to use cigarette-years, because it reduces multicollinearity

    Using Generalized Additive Models to Detect and Estimate Threshold Associations

    No full text
    In a variety of research settings, investigators may wish to detect and estimate a threshold in the association between continuous variables. A threshold model implies a non-linear relationship, with the slope changing at an unknown location. Generalized additive models (GAMs) (Hastie and Tibshirani, 1990) estimate the shape of the non-linear relationship directly from the data and, thus, may be useful in this endeavour.We propose a method based on GAMs to detect and estimate thresholds in the association between a continuous covariate and a continuous dependent variable. Using simulations, we compare it with the maximum likelihood estimation procedure proposed by Hudson (1966).We search for potential thresholds in a neighbourhood of points whose mean numerical second derivative (a measure of local curvature) of the estimated GAM curve was more than one standard deviation away from 0 across the entire range of the predictor values. A threshold association is declared if an F-test indicates that the threshold model fit significantly better than the linear model.For each method, type I error for testing the existence of a threshold against the null hypothesis of a linear association was estimated. We also investigated the impact of the position of the true threshold on power, and precision and bias of the estimated threshold.Finally, we illustrate the methods by considering whether a threshold exists in the association between systolic blood pressure (SBP) and body mass index (BMI) in two data sets.

    Urine osmolarity example: selection path (left column) of standardized regression coefficients and model stability (inclusion frequencies) in bootstrap resamples (right column) for backward elimination (BE) and augmented backward elimination (ABE).

    No full text
    <p>First row: BE with ; second row: ABE with and <sub>;</sub> third row: ABE with and . Abbreviations: ABE, augmented backward elimination; BE, backward elimination; log2UOsm, log2 of urine osmorality; log2CCL, log2 of creatinine clearance; log2Prot, log2 of proteinuria; BBlock, use of beta-blockers; PKD, presence of polycystic kidney disease; Diur, use of diuretics; Age, age in decades; ACEI, use of angiotensin-converting enzyme inhibitors and Angiotensin II type 1 receptor blockers; MAP, mean arterial pressure.</p

    Representation of exposures in regression analysis and interpretation of regression coefficients: basic concepts and pitfalls

    No full text
    Regression models are being used to quantify the effect of an exposure on an outcome, while adjusting for potential confounders. While the type of regression model to be used is determined by the nature of the outcome variable, e.g. linear regression has to be applied for continuous outcome variables, all regression models can handle any kind of exposure variables. However, some fundamentals of representation of the exposure in a regression model and also some potential pitfalls have to be kept in mind in order to obtain meaningful interpretation of results. The objective of this educational paper was to illustrate these fundamentals and pitfalls, using various multiple regression models applied to data from a hypothetical cohort of 3000 patients with chronic kidney disease. In particular, we illustrate how to represent different types of exposure variables (binary, categorical with two or more categories and continuous), and how to interpret the regression coefficients in linear, logistic and Cox models. We also discuss the linearity assumption in these models, and show how wrongly assuming linearity may produce biased results and how flexible modelling using spline functions may provide better estimate
    corecore