1,037,672 research outputs found

    Component selection and smoothing in multivariate nonparametric regression

    Full text link
    We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The ``COSSO'' is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies.Comment: Published at http://dx.doi.org/10.1214/009053606000000722 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Thinking outside the curve, part I: modeling birthweight distribution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Greater epidemiologic understanding of the relationships among fetal-infant mortality and its prognostic factors, including birthweight, could have vast public health implications. A key step toward that understanding is a realistic and tractable framework for analyzing birthweight distributions and fetal-infant mortality. The present paper is the first of a two-part series that introduces such a framework.</p> <p>Methods</p> <p>We propose describing a birthweight distribution via a normal mixture model in which the number of components is determined from the data using a model selection criterion rather than fixed <it>a priori</it>.</p> <p>Results</p> <p>We address a number of methodological issues, including how the number of components selected depends on the sample size, how the choice of model selection criterion influences the results, and how estimates of mixture model parameters based on multiple samples from the same population can be combined to produce confidence intervals. As an illustration, we find that a 4-component normal mixture model reasonably describes the birthweight distribution for a population of white singleton infants born to heavily smoking mothers. We also compare this 4-component normal mixture model to two competitors from the existing literature: a contaminated normal model and a 2-component normal mixture model. In a second illustration, we discover that a 6-component normal mixture model may be more appropriate than a 4-component normal mixture model for a general population of black singletons.</p> <p>Conclusions</p> <p>The framework developed in this paper avoids assuming the existence of an interval of birthweights over which there are no compromised pregnancies and does not constrain birthweights within compromised pregnancies to be normally distributed. Thus, the present framework can reveal heterogeneity in birthweight that is undetectable via a contaminated normal model or a 2-component normal mixture model.</p
    corecore