3,311 research outputs found

    Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models

    Full text link
    The interpretation of complex high-dimensional data typically requires the use of dimensionality reduction techniques to extract explanatory low-dimensional representations. However, in many real-world problems these representations may not be sufficient to aid interpretation on their own, and it would be desirable to interpret the model in terms of the original features themselves. Our goal is to characterise how feature-level variation depends on latent low-dimensional representations, external covariates, and non-linear interactions between the two. In this paper, we propose to achieve this through a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM). We demonstrate the utility of our model on simulated examples and applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. In each setting we show how the c-GPLVM can extract low-dimensional structures from high-dimensional data sets whilst allowing a breakdown of feature-level variability that is not present in other commonly used dimensionality reduction approaches

    A comparison of eight metamodeling techniques for the simulation of N2O fluxes and N leaching from corn crops

    Get PDF
    International audienceThe environmental costs of intensive farming activities are often under-estimated or not traded by the market, even though they play an important role in addressing future society's needs. The estimation of nitrogen (N) dynamics is thus an important issue which demands detailed simulation based methods and their integrated use to correctly represent complex and non-linear interactions into cropping systems. To calculate the N2O flux and N leaching from European arable lands, a modeling framework has been developed by linking the CAPRI agro-economic dataset with the DNDC-EUROPE bio-geo-chemical model. But, despite the great power of modern calculators, their use at continental scale is often too computationally costly. By comparing several statistical methods this paper aims to design a metamodel able to approximate the expensive code of the detailed modeling approach, devising the best compromise between estimation performance and simulation speed. We describe the use of two parametric (linear) models and six non-parametric approaches: two methods based on splines (ACOSSO and SDR), one method based on kriging (DACE), a neural networks method (multilayer perceptron, MLP), SVM and a bagging method (random forest, RF). This analysis shows that, as long as few data are available to train the model, splines approaches lead to best results, while when the size of training dataset increases, SVM and RF provide faster and more accurate solutions

    Does Farm Size and Specialization Matter for Productive Efficiency? Results from Kansas

    Get PDF
    In this article, we used bootstrap data envelopment analysis techniques to examine technical and scale efficiency scores for a balanced panel of 564 farms in Kansas for the period 1993–2007. The production technology is estimated under three different assumptions of returns to scale and the results are compared. Technical and scale efficiency is disaggregated by farm size and specialization. Our results suggest that farms are both scale and technically inefficient. On average, technical efficiency has deteriorated over the sample period. Technical efficiency varies directly by farm size and the differences are significant. Differences across farm specializations are not significant.bootstrap, data envelopment analysis, efficiency, farms, Farm Management, Production Economics, D24, Q12,

    Strongly Hierarchical Factorization Machines and ANOVA Kernel Regression

    Full text link
    High-order parametric models that include terms for feature interactions are applied to various data mining tasks, where ground truth depends on interactions of features. However, with sparse data, the high- dimensional parameters for feature interactions often face three issues: expensive computation, difficulty in parameter estimation and lack of structure. Previous work has proposed approaches which can partially re- solve the three issues. In particular, models with factorized parameters (e.g. Factorization Machines) and sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues but fail to address the third. Regarding to unstructured parameters, constraints or complicated regularization terms are applied such that hierarchical structures can be imposed. However, these methods make the optimization problem more challenging. In this work, we propose Strongly Hierarchical Factorization Machines and ANOVA kernel regression where all the three issues can be addressed without making the optimization problem more difficult. Experimental results show the proposed models significantly outperform the state-of-the-art in two data mining tasks: cold-start user response time prediction and stock volatility prediction.Comment: 9 pages, to appear in SDM'1

    Factor Mapping and Metamodelling

    Get PDF
    In this work we present some techniques, within the realm of Global Sensitivity Analysis, which permit to address fundamental questions in term of model's understanding. In particular we are interested in developing tools which allow to determine which factor (or group of factors) are most responsible for producing model outputs Y within or outside specified bounds ranking the importance of the various input factors in terms of their influence on the variation of Y. On the other hand, we look for representing in a direct way (graphically, analytically, etc.) the relationship between input factors X_1,..., X_k and output Y in order to get a better understanding of the model itself.JRC.G.9-Econometrics and statistical support to antifrau
    • …
    corecore