21,717 research outputs found
Maximally selected chi-square statistics and binary splits of nominal variables
We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected chi-square statistic has already been derived when the best cutpoint is chosen from a continuous or an ordinal X, but not when the best split is chosen from a nominal X. In this paper, we derive the exact distribution of the maximally selected chi-square statistic in this case using a combinatorial approach. Applications of the derived distribution to variable selection and hypothesis testing are discussed based on simulations. As an illustration, our method is applied to a pregnancy and birth data set
Regression tree models for designed experiments
Although regression trees were originally designed for large datasets, they
can profitably be used on small datasets as well, including those from
replicated or unreplicated complete factorial experiments. We show that in the
latter situations, regression tree models can provide simpler and more
intuitive interpretations of interaction effects as differences between
conditional main effects. We present simulation results to verify that the
models can yield lower prediction mean squared errors than the traditional
techniques. The tree models span a wide range of sophistication, from piecewise
constant to piecewise simple and multiple linear, and from least squares to
Poisson and logistic regression.Comment: Published at http://dx.doi.org/10.1214/074921706000000464 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
Three-structured smooth transition regression models based on CART algorithm
In the present work, a tree-based model that combines aspects of CART (Classification and Regression Trees) and STR (Smooth Transition Regression) is proposed. The main idea relies on specifying a parametric nonlinear model through a tree-growing procedure. The resulting model can be analysed either as a fuzzy regression or as a smooth transition regression with multiple regimes. Decisions about splits are entirely based on statistical tests of hypotheses and confidence intervals are constructed for the parameters within the terminal nodes as well as the final predictions. A Monte Carlo Experiment shows the estimators’ properties and the ability of the proposed algorithm to identify correctly several tree architectures. An application to the famous Boston Housing dataset shows that the proposed model provides better explanation with the same number of leaves as the one obtained with the CART algorithm.
- …