2,060 research outputs found
Kernel-based Information Criterion
This paper introduces Kernel-based Information Criterion (KIC) for model
selection in regression analysis. The novel kernel-based complexity measure in
KIC efficiently computes the interdependency between parameters of the model
using a variable-wise variance and yields selection of better, more robust
regressors. Experimental results show superior performance on both simulated
and real data sets compared to Leave-One-Out Cross-Validation (LOOCV),
kernel-based Information Complexity (ICOMP), and maximum log of marginal
likelihood in Gaussian Process Regression (GPR).Comment: We modified the reference 17, and the subcaptions of Figure
Sensitivity analysis for sets : application to pollutant concentration maps
In the context of air quality control, our objective is to quantify the
impact of uncertain inputs such as meteorological conditions and traffic
parameters on pollutant dispersion maps. It is worth noting that the majority
of sensitivity analysis methods are designed to deal with scalar or vector
outputs and are ill suited to a map-valued output space. To address this, we
propose two classes of methods. The first technique focuses on pointwise
indices. Sobol indices are calculated for each position on the map to obtain
Sobol index maps. Additionally, aggregated Sobol indices are calculated.
Another approach treats the maps as sets and proposes a sensitivity analysis of
a set-valued output with three different types of sensitivity indices. The
first ones are inspired by Sobol indices but are adapted to sets based on the
theory of random sets. The second ones adapt universal indices defined for a
general metric output space. The last set indices use kernel-based sensitivity
indices adapted to sets. The proposed methodologies are implemented to carry
out an uncertainty analysis for time-averaged concentration maps of pollutants
in an urban environment in the Greater Paris area. This entails taking into
account uncertain meteorological aspects, such as incoming wind speed and
direction, and uncertain traffic factors, such as injected traffic volume,
percentage of diesel vehicles, and speed limits on the road network
Kernel Methods and their derivatives: Concept and perspectives for the Earth system sciences
Kernel methods are powerful machine learning techniques which implement
generic non-linear functions to solve complex tasks in a simple way. They Have
a solid mathematical background and exhibit excellent performance in practice.
However, kernel machines are still considered black-box models as the feature
mapping is not directly accessible and difficult to interpret.The aim of this
work is to show that it is indeed possible to interpret the functions learned
by various kernel methods is intuitive despite their complexity. Specifically,
we show that derivatives of these functions have a simple mathematical
formulation, are easy to compute, and can be applied to many different
problems. We note that model function derivatives in kernel machines is
proportional to the kernel function derivative. We provide the explicit
analytic form of the first and second derivatives of the most common kernel
functions with regard to the inputs as well as generic formulas to compute
higher order derivatives. We use them to analyze the most used supervised and
unsupervised kernel learning methods: Gaussian Processes for regression,
Support Vector Machines for classification, Kernel Entropy Component Analysis
for density estimation, and the Hilbert-Schmidt Independence Criterion for
estimating the dependency between random variables. For all cases we expressed
the derivative of the learned function as a linear combination of the kernel
function derivative. Moreover we provide intuitive explanations through
illustrative toy examples and show how to improve the interpretation of real
applications in the context of spatiotemporal Earth system data cubes. This
work reflects on the observation that function derivatives may play a crucial
role in kernel methods analysis and understanding.Comment: 21 pages, 10 figures, PLOS One Journa
Sensitivity analysis with dependence and variance-based measures for spatio-temporal numerical simulators
International audienceIn a case of radioactive release in the environment, modeling the radionuclide atmospheric dispersion is particularly useful for emergency response procedures and risk assessment. For this, the CEA has developed a numerical simulator, called Ceres-Mithra, to predict spatial maps of radionuclide concentrations at different instants. This computer code depends on many uncertain scalar and temporal parameters, describing the radionuclide, release or weather characteristics. The purpose is to detect the input parameters the uncertainties of which highly affect the predicted concentrations and to quantify their influences. To this end, we present various measures for the sensitivity analysis of a spatial model. Some of them lead to as many analyses as spatial locations (site sensitivity indices) while others consider a single one, with respect to the whole spatial domain (block sensitivity indices). For both categories, variance-based and dependence measures are considered, based on recent literature. All of these sensitivity measures are applied to the CM computer code and compared to each other, showing the complementarity of block and site sensitivity analyses. Finally, a sensitivity analysis summarizing the input uncertainty contribution over the entirety of the spatio-temporal domain is proposed
Two-Stage Fuzzy Multiple Kernel Learning Based on Hilbert-Schmidt Independence Criterion
© 1993-2012 IEEE. Multiple kernel learning (MKL) is a principled approach to kernel combination and selection for a variety of learning tasks, such as classification, clustering, and dimensionality reduction. In this paper, we develop a novel fuzzy multiple kernel learning model based on the Hilbert-Schmidt independence criterion (HSIC) for classification, which we call HSIC-FMKL. In this model, we first propose an HSIC Lasso-based MKL formulation, which not only has a clear statistical interpretation that minimum redundant kernels with maximum dependence on output labels are found and combined, but also enables the global optimal solution to be computed efficiently by solving a Lasso optimization problem. Since the traditional support vector machine (SVM) is sensitive to outliers or noises in the dataset, fuzzy SVM (FSVM) is used to select the prediction hypothesis once the optimal kernel has been obtained. The main advantage of FSVM is that we can associate a fuzzy membership with each data point such that these data points can have different effects on the training of the learning machine. We propose a new fuzzy membership function using a heuristic strategy based on the HSIC. The proposed HSIC-FMKL is a two-stage kernel learning approach and the HSIC is applied in both stages. We perform extensive experiments on real-world datasets from the UCI benchmark repository and the application domain of computational biology which validate the superiority of the proposed model in terms of prediction accuracy
- …