21,717 research outputs found

    Maximally selected chi-square statistics and binary splits of nominal variables

    Get PDF
    We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected chi-square statistic has already been derived when the best cutpoint is chosen from a continuous or an ordinal X, but not when the best split is chosen from a nominal X. In this paper, we derive the exact distribution of the maximally selected chi-square statistic in this case using a combinatorial approach. Applications of the derived distribution to variable selection and hypothesis testing are discussed based on simulations. As an illustration, our method is applied to a pregnancy and birth data set

    Regression tree models for designed experiments

    Full text link
    Although regression trees were originally designed for large datasets, they can profitably be used on small datasets as well, including those from replicated or unreplicated complete factorial experiments. We show that in the latter situations, regression tree models can provide simpler and more intuitive interpretations of interaction effects as differences between conditional main effects. We present simulation results to verify that the models can yield lower prediction mean squared errors than the traditional techniques. The tree models span a wide range of sophistication, from piecewise constant to piecewise simple and multiple linear, and from least squares to Poisson and logistic regression.Comment: Published at http://dx.doi.org/10.1214/074921706000000464 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Three-structured smooth transition regression models based on CART algorithm

    Get PDF
    In the present work, a tree-based model that combines aspects of CART (Classification and Regression Trees) and STR (Smooth Transition Regression) is proposed. The main idea relies on specifying a parametric nonlinear model through a tree-growing procedure. The resulting model can be analysed either as a fuzzy regression or as a smooth transition regression with multiple regimes. Decisions about splits are entirely based on statistical tests of hypotheses and confidence intervals are constructed for the parameters within the terminal nodes as well as the final predictions. A Monte Carlo Experiment shows the estimators’ properties and the ability of the proposed algorithm to identify correctly several tree architectures. An application to the famous Boston Housing dataset shows that the proposed model provides better explanation with the same number of leaves as the one obtained with the CART algorithm.
    corecore