16 research outputs found

    On the estimation of functional random effects

    Get PDF
    Functional regression modelling has become one of the most vibrant areas of research in the last years. This discussion provides some alternative approaches to one of the key issues of functional data analysis: the basis representation of curves, and in particular, of functional random effects. First, we propose the estimation of functional principal components by penalizing the norm, and as an alternative, we provide an efficient and unified approach based on B-spline basis and quadratic penalties

    Out-of-sample prediction in multidimensional P-spline models

    Get PDF
    Prediction of out-of-sample values is a problem of interest in any regression model. In the context of penalized smooth mixed model regression Carballo et al. (2017) have proposed a general framework for prediction in additive models without interaction terms. The aim of this paper is to extend this work, based on the methodology proposed in Currie et al. (2004), to models that include interaction terms, i.e. prediction is needed in multidimensional setting. Our approach fits the data and predicts the new observations simultaneously and uses constraints to ensure a coherent fit or to impose further restrictions on the predictions. We also develop this methodology for the so called smooth-ANOVA models which allow us to include interaction terms that can be decomposed as a sum of several smooth functions. To illustrate the methodology two real data sets are used, one to predict log mortality rates in the Spanish population and another to predict aboveground biomass in Populus trees as a smooth function of height and diameter. We examine the performance of the interaction models in comparison to the Smooth-ANOVA models (both models with and without the restriction the fit has to be maintained) through a simulation study

    Out-of-sample prediction in multidimensional P-spline models

    Get PDF
    The prediction of out-of-sample values is an interesting problem in any regression model. In the context of penalized smoothing using a mixed-model reparameterization, a general framework has been proposed for predicting in additive models but without interaction terms. The aim of this paper is to generalize this work, extending the methodology proposed in the multidimensional case, to models that include interaction terms, i.e., when prediction is carried out in a multidimensional setting. Our method fits the data, predicts new observations at the same time, and uses constraints to ensure a consistent fit or impose further restrictions on predictions. We have also developed this method for the so-called smooth-ANOVA model, which allows us to include interaction terms that can be decomposed into the sum of several smooth functions. We also develop this methodology for the so-called smooth-ANOVA models, which allow us to include interaction terms that can be decomposed as a sum of several smooth functions. To illustrate the method, two real data sets were used, one for predicting the mortality of the U.S. population in a logarithmic scale, and the other for predicting the aboveground biomass of Populus trees as a smooth function of height and diameter. We examine the performance of interaction and the smooth-ANOVA model through simulation studies.This research was funded in part by Ministerio de Ciencia e Innovación grant numbers PID2019-104901RB-I00. The third author gratefully acknowledges support by the Department of Education, Language Policy and Culture from the Basque Government (BERC 2018-2021 program), the Spanish Ministry of Economy and Competitiveness MINECO and FEDER: PID2020- 115882RB-I00/AEI/10.13039/501100011033 funded by Agencia Estatal de Investigación and acronym “S3M1P4R”, and BCAM Severo Ochoa excellence accreditation SEV-2017-0718)

    A general framework for prediction in penalized regression

    Get PDF
    We present several methods for prediction of new observations in penalized regression using different methodologies, based on the methods proposed in: i) Currie et al. (2004), ii) Gilmour et al. (2004) and iii) Sacks et al. (1989). We extend the method introduced by Currie et al. (2004) to consider the prediction of new observations in the mixed model framework. In the context of penalties based on differences between adjacent coefficients (Eilers & Marx (1996)), the equivalence of the different methods is shown. We demonstrate several properties of the new coefficients in terms of the order of the penalty. We also introduce the concept memory of a P-spline, this new idea gives us information on how much past information we are using to predict. The methodology and the concept of memory of a P-spline are illustrated with three real data sets, two of them on the yearly mortality rates of Spanish men and other on rental prices.The first and the second authors acknowledge financial support from the Spanish Ministry of Economy and Competitiveness MTM2014-52184-P. The third author acknowledges financial support from the Basque Government through the BERC 2014-2017 program and by the Spanish Ministry of Economy and Competitiveness MINECO: BCAM Severo Ochoa excellence accreditation SEV-2013-0323

    On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing

    Get PDF
    We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (J Am Stat Assoc 72(358):320-338, 1977)'s work, but it is able to deal with models that have a precision matrix for the random effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (separation of overlapping precision matrices). SOP is based on applying the method of successive approximations to easy-to-compute estimate updates of the variance parameters. These estimate updates have an appealing form: they are the ratio of a (weighted) sum of squares to a quantity related to effective degrees of freedom. We provide the sufficient and necessary conditions for these estimates to be strictly positive. An important application field of SOP is penalised regression estimation of models where multiple quadratic penalties act on the same regression coefficients. We discuss in detail two of those models: penalised splines for locally adaptive smoothness and for hierarchical curve data. Several data examples in these settings are presented.This research was supported by the Basque Government through the BERC 2018-2021 program and by Spanish Ministry of Economy and Competitiveness MINECO through BCAM Severo Ochoa excellence accreditation SEV-2013-0323 and through projects MTM2017-82379-R funded by (AEI/FEDER, UE) and acronym “AFTERAM”, MTM2014-52184-P and MTM2014-55966-P. The MRI/DTI data were collected at Johns Hopkins University and the Kennedy-Krieger Institute. We are grateful to Pedro Caro and Iain Currie for useful discussions, to Martin Boer and Cajo ter Braak for the detailed reading of the paper and their many suggestions, and to Bas Engel for sharing with us his knowledge. We are also grateful to the two peer referees for their constructive comments of the paper

    Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm

    Get PDF
    A new computational algorithm for estimating the smoothing parameters of a multidimensional penalized spline generalized linear model with anisotropic penalty is presented. This new proposal is based on the mixed model representation of a multidimensional P-spline, in which the smoothing parameter for each covariate is expressed in terms of variance components. On the basis of penalized quasi-likelihood methods, closed-form expressions for the estimates of the variance components are obtained. This formulation leads to an efficient implementation that considerably reduces the computational burden. The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)-for variance components estimation-to deal with non-standard structures of the covariance matrix of the random effects. The practical performance of the proposed algorithm is evaluated by means of simulations, and comparisons with alternative methods are made on the basis of the mean square error criterion and the computing time. Finally, we illustrate our proposal with the analysis of two real datasets: a two dimensional example of historical records of monthly precipitation data in USA and a three dimensional one of mortality data from respiratory disease according to the age at death, the year of death and the month of death.The authors would like to express their gratitude for the support received in the form of the Spanish Ministry of Economy and Competitiveness grants MTM2011-28285-C02-01 and MTM2011-28285-C02-02. The research of Dae-Jin Lee was funded by an NIH grant for the Superfund Metal Mixtures, Biomarkers and Neurodevelopment project 1PA2ES016454-01A2

    Penalized composite link models for aggregated spatial count data: A mixed model approach

    Get PDF
    Mortality data provide valuable information for the study of the spatial distribution of mortality risk, in disciplines such as spatial epidemiology and public health. However, they are frequently available in an aggregated form over irregular geographical units, hindering the visualization of the underlying mortality risk. Also, it can be of interest to obtain mortality risk estimates on a finer spatial resolution, such that they can be linked to potential risk factors that are usually measured in a different spatial resolution. In this paper, we propose the use of the penalized composite link model and its mixed model representation. This model considers the nature of mortality rates by incorporating the population size at the finest resolution, and allows the creation of mortality maps at a finer scale, thus reducing the visual bias resulting from the spatial aggregation within original units. We also extend the model by considering individual random effects at the aggregated scale, in order to take into account the overdispersion. We illustrate our novel proposal using two datasets: female deaths by lung cancer in Indiana, USA, and male lip cancer incidence in Scotland counties. We also compare the performance of our proposal with the area-to-point Poisson kriging approach.We would like to thank two reviewers and an associate editor for their constructive comments and suggestions on the original manuscript. We also thank Dr. Pierre Goovaerts, who provided the high resolution population estimates described in Section 3.1. This research was supported by the Spanish Ministry of Economy and Competitiveness grants MTM2011-28285-C02-02 and MTM2014-52184-P. The research of Dae-Jin Lee was also supported by the Basque Government through the BERC 2014-2017 and ELKARTEK programs and by the Spanish Ministry of Economy and Competitiveness MINECO: BCAM Severo Ochoa excellence accreditation SEV-2013-0323. The research of Paul H. C. Eilers was also supported by the Universidad Carlos III de Madrid-Banco Santander Chair of Excellence program

    BioStatNet: an interdisciplinary biostatistics network

    Get PDF
    Biostatistics has become a major scientific component of biomedical research with a strong interdisciplinary basis. This endeavour is essentially interdisciplinary, therefore, training of future biostatisticians must focus its efforts on the development of successful mechanisms of communication and cooperation between researchers from different disciplines. The Biostatistics National Network, BioStatNet, has been created aiming to link together Spanish and foreign researchers in Biostatistics with an integrative and open attitude. It also intends to serve as a platform for the adequate training of biostatisticians as a means towards achieving effective interdisciplinarity.Peer ReviewedPostprint (published version

    On constrained smoothing and out-of-range prediction using P-splines: A conic optimization approach

    Get PDF
    Decision-making is often based on the analysis of complex and evolving data. Thus, having systems which allow to incorporate human knowledge and provide valuable support to the decider becomes crucial. In this work, statistical modelling and mathematical optimiza- tion paradigms merge to address the problem of estimating smooth curves which verify structural properties, both in the observed domain in which data have been gathered and outwards. We assume that the curve to be estimated is defined through a reduced-rank basis ( B -splines) and fitted via a penalized splines approach ( P-splines). To incorporate re- quirements about the sign, monotonicity and curvature in the fitting procedure, a conic programming approach is developed which, for the first time, successfully conveys out- of-range constrained prediction. In summary, the contributions of this paper are fourfold: first, a mathematical optimization formulation for the estimation of non-negative P-splines is proposed; second, previous results are generalized to the out-of-range prediction frame- work; third, these approaches are extended to other shape constraints and to multiple curves fitting; and fourth, an open source Python library is developed: cpsplines . We use simulated instances, data of the evolution of the COVID-19 pandemic and of mortality rates for different age groups to test our approaches.This research is supported by projects PID2019-104901RB-I00, PID2019-110886RB-I00 and IJC2020-045220-I (funded by MCIN/AEI/10.13039/501100011033 and the last also supported by European Union “NextGenerationEU/PRTR” funds), FQM-329, P18-FR-2369 and US-13811 (Junta de Andalucía), IND2020/TIC-17526 (Comunidad de Madrid) and Universidad Carlos III de Madrid (Read & Publish Agreement CRUE-CSIC 2022)
    corecore