227 research outputs found

    Variable Selection and Model Choice in Structured Survival Models

    Get PDF
    In many situations, medical applications ask for flexible survival models that allow to extend the classical Cox-model via the inclusion of time-varying and nonparametric effects. These structured survival models are very flexible but additional difficulties arise when model choice and variable selection is desired. In particular, it has to be decided which covariates should be assigned time-varying effects or whether parametric modeling is sufficient for a given covariate. Component-wise boosting provides a means of likelihood-based model fitting that enables simultaneous variable selection and model choice. We introduce a component-wise likelihood-based boosting algorithm for survival data that permits the inclusion of both parametric and nonparametric time-varying effects as well as nonparametric effects of continuous covariates utilizing penalized splines as the main modeling technique. Its properties and performance are investigated in simulation studies. The new modeling approach is used to build a flexible survival model for intensive care patients suffering from severe sepsis. A software implementation is available to the interested reader

    Building Cox-Type Structured Hazard Regression Models with Time-Varying Effects

    Get PDF
    In recent years, flexible hazard regression models based on penalised splines have been developed that allow us to extend the classical Cox-model via the inclusion of time-varying and nonparametric effects. Despite their immediate appeal in terms of flexibility, these models introduce additional difficulties when a subset of covariates and the corresponding modelling alternatives have to be chosen. We present an analysis of data from a specific patient population with 90-day survival as the response variable. The aim is to determine a sensible prognostic model where some variables have to be included due to subject-matter knowledge while other variables are subject to model selection. Motivated by this application, we propose a twostage stepwise model building strategy to choose both the relevant covariates and the corresponding modelling alternatives within the choice set of possible covariates simultaneously. For categorical covariates, competing modelling approaches are linear effects and time-varying effects, whereas nonparametric modelling provides a further alternative in case of continuous covariates. In our data analysis, we identified a prognostic model containing both smooth and time-varying effects

    Bayesian splines versus fractional polynomials in network meta-analysis

    Get PDF
    BACKGROUND: Network meta-analysis (NMA) provides a powerful tool for the simultaneous evaluation of multiple treatments by combining evidence from different studies, allowing for direct and indirect comparisons between treatments. In recent years, NMA is becoming increasingly popular in the medical literature and underlying statistical methodologies are evolving both in the frequentist and Bayesian framework. Traditional NMA models are often based on the comparison of two treatment arms per study. These individual studies may measure outcomes at multiple time points that are not necessarily homogeneous across studies. METHODS: In this article we present a Bayesian model based on B-splines for the simultaneous analysis of outcomes across time points, that allows for indirect comparison of treatments across different longitudinal studies. RESULTS: We illustrate the proposed approach in simulations as well as on real data examples available in the literature and compare it with a model based on P-splines and one based on fractional polynomials, showing that our approach is flexible and overcomes the limitations of the latter. CONCLUSIONS: The proposed approach is computationally efficient and able to accommodate a large class of temporal treatment effect patterns, allowing for direct and indirect comparisons of widely varying shapes of longitudinal profiles

    A comparison of statistical methods for age-specific reference values of discrete scales

    Get PDF
    Age-specific reference values are important in medical science to evaluate the normal ranges of subjects and to help physicians signal potential disorders as early as possible. They are applied to many types of measurements, including discrete measures obtained from questionnaires and clinical tests. These discrete measures are typically skewed to the left and bounded by a maximum score of one (or 100%). This article investigates the performances of various statistical methods, including quantile regression, the Lambda-Mu-Sigma (LMS) method and its extensions, and the generalized additive models for location, scale, and shape with zero and one-inflated distributions implemented with either fractional polynomials or splines, for age-specific reference values on discrete measures. Their large-sample performances were investigated using Monte-Carlo simulations, and the consistency of splines and fractional polynomials age profiles with quantile regression had been demonstrated as well. The advantages and disadvantages of these methods were illustrated with data on the Infant Motor Profile, a test score on motor behavior in children of 3–18 months. We concluded that quantile regression with fractional polynomials approach is a robust and computationally efficient method for setting age-specific reference values for discrete measures.</p

    State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues

    Get PDF
    Background: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc ‘traditional’ approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics. Methods: We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling. Results: Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research. Conclusions: Selection of variables and of functional forms are important topics in multivariable analysis. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, further comparative research is required

    Multiple imputation in Cox regression when there are time-varying effects of covariates.

    Get PDF
    In Cox regression, it is important to test the proportional hazards assumption and sometimes of interest in itself to study time-varying effects (TVEs) of covariates. TVEs can be investigated with log hazard ratios modelled as a function of time. Missing data on covariates are common and multiple imputation is a popular approach to handling this to avoid the potential bias and efficiency loss resulting from a "complete-case" analysis. Two multiple imputation methods have been proposed for when the substantive model is a Cox proportional hazards regression: an approximate method (Imputing missing covariate values for the Cox model in Statistics in Medicine (2009) by White and Royston) and a substantive-model-compatible method (Multiple imputation of covariates by fully conditional specification: accommodating the substantive model in Statistical Methods in Medical Research (2015) by Bartlett et al). At present, neither accommodates TVEs of covariates. We extend them to do so for a general form for the TVEs and give specific details for TVEs modelled using restricted cubic splines. Simulation studies assess the performance of the methods under several underlying shapes for TVEs. Our proposed methods give approximately unbiased TVE estimates for binary covariates with missing data, but for continuous covariates, the substantive-model-compatible method performs better. The methods also give approximately correct type I errors in the test for proportional hazards when there is no TVE and gain power to detect TVEs relative to complete-case analysis. Ignoring TVEs at the imputation stage results in biased TVE estimates, incorrect type I errors, and substantial loss of power in detecting TVEs. We also propose a multivariable TVE model selection algorithm. The methods are illustrated using data from the Rotterdam Breast Cancer Study. R code is provided

    Generalized survival models as a tool for medical research

    Get PDF
    In medical research, many studies with the time-to-event outcomes investigate the effect of an exposure (or treatment) on patients’ survival. For the analysis of time-to-event or survival data, model-based approaches have been commonly applied. In this thesis, a class of regression models on the survival scale, termed generalized survival models (GSMs previously described in Appendix A of [1]), and full likelihood-based estimation methods were presented along with four papers. The overall aim was to provide a rich and coherent framework for modelling either independent or correlated survival data. Our main contributions to GSMs and related estimation approaches were as follows: First, we refined the mathematical and statistical backgrounds of the model components, including the link function, log-time, and smooth univariate functions. Second, we broadened the class to include generalized additive functional forms for representing covariate effects, such as non-linear forms, time-dependent effects, joint time-dependent and non-linear effects for age, and multivariate regression splines. Third, we introduced the thin plate regression splines [2], which can use knot free bases, as an alternative regression tool to knot-based regression splines into GSMs. Fourth, under a penalized likelihood framework, we integrated the process of parametric estimation and model selection for the number of spline basis functions. These refinements, extensions, and related assessments were undertaken in the first three papers. These newly proposed features of GSMs and estimation methods were implemented and integrated into the rstpm2 package in R. This thesis consists of four research papers for modeling either independent or correlated survival data, together with either overall or net survival to be the measure of interest. In Paper I, the outcomes under study were independent time-to-death due to any cause (or time-to-any recurrence of disease). Parametric and penalized GSMs were introduced with extensions, simulation studies and applications. In Paper II, the outcome of interest was correlated time-to-some specific event due to any cause, such as time-to-event data collected from patients in the same clinics. It is reasonable to consider that the subjects within a cluster may share some unmeasured environmental or genetic risk factors, which are commonly modeled by a random effect b (or frailty U) and assumed to be independent of given baseline covariates. In this paper, GSMs with novel extensions were proposed to analyze correlated time-to-event data. In Paper III, we extended GSMs with novel features for relative survival analysis; the outcome of interest was time-to-death due to the disease under study. In Paper IV, we analyzed time-to-repeated event within the same subject using the proposed methods in Paper II and described the time-dependent cumulative risks of subsequent outcomes for men in different states since study entry. In summary, these proposed methods performed well in extensive simulation studies under the investigated setting, with good point estimates and coverage probabilities. Through the analysis of example data sets, similar results can also be observed using the proposed methods and other well-established approaches, under proportional hazards or proportional odds models settings. Moreover, novel features were also illustrated in both simulations and applications. Generally, the combination of GSMs and full-likelihood based estimation methods can provide alternative tools for the analysis of survival data in medical research

    Effects of Influential Points and Sample Size on the Selection and Replicability of Multivariable Fractional Polynomial Models

    Full text link
    The multivariable fractional polynomial (MFP) procedure combines variable selection with a function selection procedure (FSP). For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1 or FP2 functions. Influential observations (IPs) and small sample size can both have an impact on a selected fractional polynomial model. In this paper, we used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In seven subsamples we also investigated the effects of sample size and model replicability. For better illustration, a structured profile was used to provide an overview of all analyses conducted. The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP might not be able to detect non-linear functions and the selected model might differ substantially from the true underlying model. However, if the sample size is sufficient and regression diagnostics are carefully conducted, MFP can be a suitable approach to select variables and functional forms for continuous variables.Comment: Main paper and a supplementary combine
    corecore