2,060 research outputs found

    Kernel-based Information Criterion

    Full text link
    This paper introduces Kernel-based Information Criterion (KIC) for model selection in regression analysis. The novel kernel-based complexity measure in KIC efficiently computes the interdependency between parameters of the model using a variable-wise variance and yields selection of better, more robust regressors. Experimental results show superior performance on both simulated and real data sets compared to Leave-One-Out Cross-Validation (LOOCV), kernel-based Information Complexity (ICOMP), and maximum log of marginal likelihood in Gaussian Process Regression (GPR).Comment: We modified the reference 17, and the subcaptions of Figure

    Sensitivity analysis for sets : application to pollutant concentration maps

    Full text link
    In the context of air quality control, our objective is to quantify the impact of uncertain inputs such as meteorological conditions and traffic parameters on pollutant dispersion maps. It is worth noting that the majority of sensitivity analysis methods are designed to deal with scalar or vector outputs and are ill suited to a map-valued output space. To address this, we propose two classes of methods. The first technique focuses on pointwise indices. Sobol indices are calculated for each position on the map to obtain Sobol index maps. Additionally, aggregated Sobol indices are calculated. Another approach treats the maps as sets and proposes a sensitivity analysis of a set-valued output with three different types of sensitivity indices. The first ones are inspired by Sobol indices but are adapted to sets based on the theory of random sets. The second ones adapt universal indices defined for a general metric output space. The last set indices use kernel-based sensitivity indices adapted to sets. The proposed methodologies are implemented to carry out an uncertainty analysis for time-averaged concentration maps of pollutants in an urban environment in the Greater Paris area. This entails taking into account uncertain meteorological aspects, such as incoming wind speed and direction, and uncertain traffic factors, such as injected traffic volume, percentage of diesel vehicles, and speed limits on the road network

    Kernel Methods and their derivatives: Concept and perspectives for the Earth system sciences

    Full text link
    Kernel methods are powerful machine learning techniques which implement generic non-linear functions to solve complex tasks in a simple way. They Have a solid mathematical background and exhibit excellent performance in practice. However, kernel machines are still considered black-box models as the feature mapping is not directly accessible and difficult to interpret.The aim of this work is to show that it is indeed possible to interpret the functions learned by various kernel methods is intuitive despite their complexity. Specifically, we show that derivatives of these functions have a simple mathematical formulation, are easy to compute, and can be applied to many different problems. We note that model function derivatives in kernel machines is proportional to the kernel function derivative. We provide the explicit analytic form of the first and second derivatives of the most common kernel functions with regard to the inputs as well as generic formulas to compute higher order derivatives. We use them to analyze the most used supervised and unsupervised kernel learning methods: Gaussian Processes for regression, Support Vector Machines for classification, Kernel Entropy Component Analysis for density estimation, and the Hilbert-Schmidt Independence Criterion for estimating the dependency between random variables. For all cases we expressed the derivative of the learned function as a linear combination of the kernel function derivative. Moreover we provide intuitive explanations through illustrative toy examples and show how to improve the interpretation of real applications in the context of spatiotemporal Earth system data cubes. This work reflects on the observation that function derivatives may play a crucial role in kernel methods analysis and understanding.Comment: 21 pages, 10 figures, PLOS One Journa

    Sensitivity analysis with dependence and variance-based measures for spatio-temporal numerical simulators

    Get PDF
    International audienceIn a case of radioactive release in the environment, modeling the radionuclide atmospheric dispersion is particularly useful for emergency response procedures and risk assessment. For this, the CEA has developed a numerical simulator, called Ceres-Mithra, to predict spatial maps of radionuclide concentrations at different instants. This computer code depends on many uncertain scalar and temporal parameters, describing the radionuclide, release or weather characteristics. The purpose is to detect the input parameters the uncertainties of which highly affect the predicted concentrations and to quantify their influences. To this end, we present various measures for the sensitivity analysis of a spatial model. Some of them lead to as many analyses as spatial locations (site sensitivity indices) while others consider a single one, with respect to the whole spatial domain (block sensitivity indices). For both categories, variance-based and dependence measures are considered, based on recent literature. All of these sensitivity measures are applied to the CM computer code and compared to each other, showing the complementarity of block and site sensitivity analyses. Finally, a sensitivity analysis summarizing the input uncertainty contribution over the entirety of the spatio-temporal domain is proposed

    Two-Stage Fuzzy Multiple Kernel Learning Based on Hilbert-Schmidt Independence Criterion

    Full text link
    © 1993-2012 IEEE. Multiple kernel learning (MKL) is a principled approach to kernel combination and selection for a variety of learning tasks, such as classification, clustering, and dimensionality reduction. In this paper, we develop a novel fuzzy multiple kernel learning model based on the Hilbert-Schmidt independence criterion (HSIC) for classification, which we call HSIC-FMKL. In this model, we first propose an HSIC Lasso-based MKL formulation, which not only has a clear statistical interpretation that minimum redundant kernels with maximum dependence on output labels are found and combined, but also enables the global optimal solution to be computed efficiently by solving a Lasso optimization problem. Since the traditional support vector machine (SVM) is sensitive to outliers or noises in the dataset, fuzzy SVM (FSVM) is used to select the prediction hypothesis once the optimal kernel has been obtained. The main advantage of FSVM is that we can associate a fuzzy membership with each data point such that these data points can have different effects on the training of the learning machine. We propose a new fuzzy membership function using a heuristic strategy based on the HSIC. The proposed HSIC-FMKL is a two-stage kernel learning approach and the HSIC is applied in both stages. We perform extensive experiments on real-world datasets from the UCI benchmark repository and the application domain of computational biology which validate the superiority of the proposed model in terms of prediction accuracy
    corecore