119,825 research outputs found

    Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models

    Full text link
    The interpretation of complex high-dimensional data typically requires the use of dimensionality reduction techniques to extract explanatory low-dimensional representations. However, in many real-world problems these representations may not be sufficient to aid interpretation on their own, and it would be desirable to interpret the model in terms of the original features themselves. Our goal is to characterise how feature-level variation depends on latent low-dimensional representations, external covariates, and non-linear interactions between the two. In this paper, we propose to achieve this through a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM). We demonstrate the utility of our model on simulated examples and applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. In each setting we show how the c-GPLVM can extract low-dimensional structures from high-dimensional data sets whilst allowing a breakdown of feature-level variability that is not present in other commonly used dimensionality reduction approaches

    Multivariate small sample tests for two-way designs with applications to industrial statistics

    Get PDF
    In this paper, we present a novel nonparametric approach for multivariate analysis of two-way crossed factorial design based on NonParametric Combination applied to Synchronized Permutation tests. This nonparametric hypothesis testing procedure not only allows to overcome the shortcomings of MANOVA test like violation of assumptions such as multivariate normality or covariance homogeneity, but, in an extensive simulation study, reveals to be a powerful instrument both in case of small sample size and many response variables. We contextualize its application in the field of industrial experiments and we assume a linear additive model for the data set analysis. Indeed, the linear additive model interpretation well adapts to the industrial production environment because of the way control of production machineries is implemented. The case of small sample size reflects the frequent needs of practitioners in the industrial environment where there are constraints or limited resources for the experimental design. Furthermore, an increase in rejection rate can be observed under alternative hypothesis when the number of response variables increases with fixed number of observed units. This could lead to a strategical benefit considering that in many real problems it could be easier to collect more information on a single experimental unit than adding a new unit to the experimental design. An application to industrial thermoforming processes is useful to illustrate and highlight the benefits of the adoption of the herein presented nonparametric approach

    A mixed model approach for structured hazard regression

    Get PDF
    The classical Cox proportional hazards model is a benchmark approach to analyze continuous survival times in the presence of covariate information. In a number of applications, there is a need to relax one or more of its inherent assumptions, such as linearity of the predictor or the proportional hazards property. Also, one is often interested in jointly estimating the baseline hazard together with covariate effects or one may wish to add a spatial component for spatially correlated survival data. We propose an extended Cox model, where the (log-)baseline hazard is weakly parameterized using penalized splines and the usual linear predictor is replaced by a structured additive predictor incorporating nonlinear effects of continuous covariates and further time scales, spatial effects, frailty components, and more complex interactions. Inclusion of time-varying coefficients leads to models that relax the proportional hazards assumption. Nonlinear and time-varying effects are modelled through penalized splines, and spatial components are treated as correlated random effects following either a Markov random field or a stationary Gaussian random field. All model components, including smoothing parameters, are specified within a unified framework and are estimated simultaneously based on mixed model methodology. The estimation procedure for such general mixed hazard regression models is derived using penalized likelihood for regression coefficients and (approximate) marginal likelihood for smoothing parameters. Performance of the proposed method is studied through simulation and an application to leukemia survival data in Northwest England

    Absorptive capacity and the growth and investment effects of regional transfers : a regression discontinuity design with heterogeneous treatment effects

    Get PDF
    Researchers often estimate average treatment effects of programs without investigating heterogeneity across units. Yet, individuals, firms, regions, or countries vary in their ability, e.g., to utilize transfers. We analyze Objective 1 Structural Funds transfers of the European Commission to regions of EU member states below a certain income level by way of a regression discontinuity design with systematically heterogeneous treatment effects. Only about 30% and 21% of the regions - those with sufficient human capital and good-enough institutions - are able to turn transfers into faster per-capita income growth and per-capita investment. In general, the variance of the treatment effect is much bigger than its mean

    Semiparametric analysis to estimate the deal effect curve

    Get PDF
    The marketing literature suggests several phenomena that may contribute to the shape of the relationship between sales and price discounts. These phenomena can produce severe nonlinearities and interactions in the curves, and we argue that those are best captured with a flexible approach. Since a fully nonparametric regression model suffers from the curse of dimensionality, we propose a semiparametric regression model. Store-level sales over time is modeled as a nonparametric function of own-and cross-item price discounts, and a parametric function of other predictors (all indicator variables). We compare the predictive validity of the semiparametric model with that of two parametric benchmark models and obtain better performance on average. The results for three product categories indicate a.o. threshold- and saturation effects for both own- and cross-item temporary price cuts. We also show how the own-item curve depends on other items’ price discounts (flexible interaction effects). In a separate analysis, we show how the shape of the deal effect curve depends on own-item promotion signals. Our results indicate that prevailing methods for the estimation of deal effects on sales are inadequate.
    • …
    corecore