106 research outputs found

    Confidence sets for split points in decision trees

    Full text link
    We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others calibrated using plug-in estimates of some nuisance parameters. The performance of the confidence sets is assessed in a simulation study. A motivation for developing such confidence sets comes from the problem of phosphorus pollution in the Everglades. Ecologists have suggested that split points provide a phosphorus threshold at which biological imbalance occurs, and the lower endpoint of the confidence set may be interpreted as a level that is protective of the ecosystem. This is illustrated using data from a Duke University Wetlands Center phosphorus dosing study in the Everglades.Comment: Published at http://dx.doi.org/10.1214/009053606000001415 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the capacity of channels with Gaussian and non-Gaussian noise

    Get PDF
    We evaluate the information capacity of channels for which the noise process is a Gaussian measure on a quasi-complete locally convex space. The coding capacity is calculated in this setting and for time-continuous Gaussian channels using the information capacity result. The coding capacity of channels with non-Gaussian noise having finite entropy with respect to Gaussian noise of the same covariance is shown not to exceed the coding capacity of the Gaussian channel. The sensitivity of the information capacity to deviations from normality in the noise process is also investigated

    Proportional hazards models with continuous marks

    Full text link
    For time-to-event data with finitely many competing risks, the proportional hazards model has been a popular tool for relating the cause-specific outcomes to covariates [Prentice et al. Biometrics 34 (1978) 541--554]. This article studies an extension of this approach to allow a continuum of competing risks, in which the cause of failure is replaced by a continuous mark only observed at the failure time. We develop inference for the proportional hazards model in which the regression parameters depend nonparametrically on the mark and the baseline hazard depends nonparametrically on both time and mark. This work is motivated by the need to assess HIV vaccine efficacy, while taking into account the genetic divergence of infecting HIV viruses in trial participants from the HIV strain that is contained in the vaccine, and adjusting for covariate effects. Mark-specific vaccine efficacy is expressed in terms of one of the regression functions in the mark-specific proportional hazards model. The new approach is evaluated in simulations and applied to the first HIV vaccine efficacy trial.Comment: Published in at http://dx.doi.org/10.1214/07-AOS554 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Comparing Distribution Functions via Empirical Likelihood

    Get PDF
    This paper develops empirical likelihood based simultaneous confidence bands for differences and ratios of two distribution functions from independent samples of right-censored survival data. The proposed confidence bands provide a flexible way of comparing treatments in biomedical settings, and bring empirical likelihood methods to bear on important target functions for which only Wald-type confidence bands have been available in the literature. The approach is illustrated with a real data example

    The Two-sample Problem for Failure Rates Depending on a Continuous Mark: An Application to Vaccine Efficacy

    Get PDF
    The efficacy of an HIV vaccine to prevent infection is likely to depend on the genetic variation of the exposing virus. This paper addresses the problem of using data on the HIV sequences that infect vaccine efficacy trial participants to 1) test for vaccine efficacy more powerfully than procedures that ignore the sequence data; and 2) evaluate the dependence of vaccine efficacy on the divergence of infecting HIV strains from the HIV strain that is contained in the vaccine. Because hundreds of amino acid sites in each HIV genome are sequenced, it is natural to treat the divergence (defined in terms of Hamming distance say) as a continuous mark variable that accompanies each failure (infection) time. Problems 1) and 2) can then be approached by testing whether the ratio of the mark-specific hazard functions for the vaccine and placebo groups is unity or independent of the mark, respectively. We develop nonparametric and semiparametric tests for these null hypotheses, based on contrasts of Nelson–Aalen-type estimates of cumulative mark-specific hazard functions for the two groups. Techniques for nonparametric estimation of mark-specific vaccine efficacy based on the cumulative mark-specific incidence functions are also developed. Numerical studies show satisfactory performance of the procedures. The methods are illustrated with application to HIV genetic sequence data collected in the first HIV vaccine efficacy trial. The methodology applies generally to the study of relative risks of failure wherein a continuous mark variable accompanies each failure event

    Tests for Comparing Mark-Specific Hazards and Cumulative Incidence Functions

    Get PDF
    It is of interest in some applications to determine whether there is a relationship between a hazard rate function (or a cumulative incidence function) and a mark variable which is only observed at uncensored failure times. We develop nonparametric tests for this problem when the mark variable is continuous. Tests are developed for the null hypothesis that the mark-specific hazard rate is independent of the mark versus ordered and two-sided alternatives expressed in terms of mark-specific hazard functions and mark-specific cumulative incidence functions. The test statistics are based on functionals of a bivariate test process equal to a weighted average of differences between a Nelson--Aalen-type estimator of the mark-specific cumulative hazard function and a nonparametric estimator of this function under the null hypothesis. The weight function in the test process can be chosen so that the test statistics are asymptotically distribution-free.Asymptotically correct critical values are obtained through a simple simulation procedure. The testing procedures are shown to perform well in numerical studies, and are illustrated with an AIDS clinical trial example. Specifically, the tests are used to assess if the instantaneous or absolute risk of treatment failure depends on the amount of accumulation of drug resistance mutations in a subject\u27s HIV virus. This assessment helps guide development of anti-HIV therapies that surmount the problem of drug resistance
    • …
    corecore