95 research outputs found

    Implementation of complex interactions in a Cox regression framework

    Get PDF
    The standard Cox proportional hazards model has been extended by functionally describable interaction terms. The first of which are related to neural networks by adopting the idea of transforming sums of weighted covariables by means of a logistic function. A class of reasonable weight combinations within the logistic transformation is described. Apart from the standard covariable product interaction, a product of logistically transformed covariables has also been included in the analysis of performance of the new terms. An algorithm combining likelihood ratio tests and AIC criterion has been defined for model choice. The critical values of the likelihood ratio test statistics had to be corrected in order to guarantee a maximum type I error of 5% for each interaction term. The new class of interaction terms allows interpretation of functional relationships between covariables with more flexibility and can easily be implemented in standard software packages

    Two Survival Tree Models for Myocardial Infarction Patients

    Get PDF
    In the search of a better prognostic survival model for post-acute myocardial infarction patients, the scientists at the Technical University of Munich's "Klinikum rechts der Isar" and the German Heart Center in Munich have developed some new parameters using 24-hour ECG (Schmidt et al 1999). A series of investigations were done using these parameters on different data sets and the Cox-PH model (Schmidt et al 1999, Ulm et al 2000). This paper is a response to the discussion paper by Ulm et al (2000), which suggests a Cox model for calculating the risk stratification of the MPIP data set patients including the predictors ejection fraction and heart rate turbulence. The current paper suggests the use of the classification and regression trees technique for survival data in order to deduct a survival stratification model for the NIRVPIP data set. Two models are compared: one contains the variables suggested by Ulm et al (2000) the other model has two additional variables, namely presence of couplets and number of extra systolic beats in the longest salvo of the patient's 24-hour ECG. The second model is shown to be an improvement on the first one

    Variable selection with Random Forests for missing data

    Get PDF
    Variable selection has been suggested for Random Forests to improve their efficiency of data prediction and interpretation. However, its basic element, i.e. variable importance measures, can not be computed straightforward when there is missing data. Therefore an extensive simulation study has been conducted to explore possible solutions, i.e. multiple imputation, complete case analysis and a newly suggested importance measure for several missing data generating processes. The ability to distinguish relevant from non-relevant variables has been investigated for these procedures in combination with two popular variable selection methods. Findings and recommendations: Complete case analysis should not be applied as it lead to inaccurate variable selection and models with the worst prediction accuracy. Multiple imputation is a good means to select variables that would be of relevance in fully observed data. It produced the best prediction accuracy. By contrast, the application of the new importance measure causes a selection of variables that reflects the actual data situation, i.e. that takes the occurrence of missing values into account. It's error was only negligible worse compared to imputation

    Responder Identification in Clinical Trials with Censored Data

    Get PDF
    We present a newly developed technique for identification of positive and negative responders to a new treatment which was compared to a classical treatment (or placebo) in a randomized clinical trial. This bump-hunting-based method was developed for trials in which the two treatment arms do not differ in survival overall. It checks in a systematic manner if certain subgroups, described by predictive factors do show difference in survival due to the new treatment. Several versions of the method were discussed and compared in a simulation study. The best version of the responder identification method employs martingale residuals to a prognostic model as response in a stabilized through bootstrapping bump hunting procedure. On average it recognizes 90% of the time the correct positive responder group and 99% of the time the correct negative responder group

    Multidimensional isotonic regression and estimation of the threshold value

    Get PDF
    No abstract available

    Random Forest variable importance with missing data

    Get PDF
    Random Forests are commonly applied for data prediction and interpretation. The latter purpose is supported by variable importance measures that rate the relevance of predictors. Yet existing measures can not be computed when data contains missing values. Possible solutions are given by imputation methods, complete case analysis and a newly suggested importance measure. However, it is unknown to what extend these approaches are able to provide a reliable estimate of a variables relevance. An extensive simulation study was performed to investigate this property for a variety of missing data generating processes. Findings and recommendations: Complete case analysis should not be applied as it inappropriately penalized variables that were completely observed. The new importance measure is much more capable to reflect decreased information exclusively for variables with missing values and should therefore be used to evaluate actual data situations. By contrast, multiple imputation allows for an estimation of importances one would potentially observe in complete data situations

    Tests for Trends in Binary Response

    Get PDF
    Tests for trend in binary response are especially important when analyzing animal experiments where the response in various dose--groups is of interest. Among the nonparametric tests the approach of Cochran and Armitage is the one which is most commonly used. This test (CA-test) is actually a test for a linear trend. The result of this test is highly dependent on the quantification of the dose. Varying score assignments can lead to totally different results. As an alternative isotonic regression is proposed. The result of this approach is independent of any monotonic transformation of the dose. The p--value related with the isotonic regression can be obtained either from considering all possible combinations of the total number of events in the dose--groups or by analyzing a random sample of all permutations. Both tests are compared within a simulation--study and on data from an experiment considering whether a certain type of fibre, para--aramid, is carcinogenic. The result of the commonly used CA--test is highly dependent on the event rate in the lowest and highest dose--group. Based on our analyses we recommend to use the isotonic regression instead of the test proposed by Cochran and Armitage

    Extension of CART using multiple splits under order restrictions

    Get PDF
    CART was introduced by Breiman et al. (1984) as a classification tool. It divides the whole sample recursively in two subpopulations by finding the best possible split with respect to a optimisation criterion. This method, restricted up to date to binary splits, is extended in this paper for allowing also multiple splits. The main problem with this extension is related to the optimal number of splits and the location of the corresponding cutpoints. In order to reduce the computational effort and enhance parsimony, the reduced isotonic regression was used in order to solve this problem. The extended CART method was tested in a simulation study and was compared with the classical approach in an epidemiological study. In both studies the extended CART turned out to be a useful and reliable alternative

    Modelling time-varying effects in Cox model under order restrictions

    Get PDF
    The violation of the proportional hazards assumption in Cox model occurs quite often in studies concerning solid tumours or leukaemia. Then the time varying coefficients model is its most popular extension used. The function f(t) that measures the time variation of a covariate, can be assessed through several smoothing techniques, such as cubic splines. However, for practical propose, it is more convenient to assess f(t) by a step function. The main drawback of this approach is the lack of stability since there is no standard method of defining the cutpoints of the underlined step function. The variation in the effect of a predictor can be assumed to be monotonic during the observational period. In these cases, we propose a method to estimate f(t) based on the isotonic regression framework. Applying the idea of Grambsch and Therneau, where smoothing the Schoenfeld residuals plotted against time reveal the shape of the underlined f(t) function, we use the Pooled Adjacent Violators Algorithm as smoother. As a result a set of cutpoints is returned without any a priori information about their location. Subsequently, the corresponding step function is introduced in the model and the standard likelihood-based method is applied to estimate it while adjusting for other covariates. This approach presents the advantage that additional decisions that can effect the result, as the number of knots in cubic splines, do not need to be taken. The performance of the provided PH test and the stability of the method are explored in a simulation study
    • ā€¦
    corecore