150,984 research outputs found

    Kernel-based Information Criterion

    Full text link
    This paper introduces Kernel-based Information Criterion (KIC) for model selection in regression analysis. The novel kernel-based complexity measure in KIC efficiently computes the interdependency between parameters of the model using a variable-wise variance and yields selection of better, more robust regressors. Experimental results show superior performance on both simulated and real data sets compared to Leave-One-Out Cross-Validation (LOOCV), kernel-based Information Complexity (ICOMP), and maximum log of marginal likelihood in Gaussian Process Regression (GPR).Comment: We modified the reference 17, and the subcaptions of Figure

    Kernel density construction using orthogonal forward regression

    No full text
    An automatic algorithm is derived for constructing kernel density estimates based on a regression approach that directly optimizes generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimizes the leave-one-out test score. Local regularization is incorporated into the density construction process to further enforce sparsity. Examples are included to demonstrate the ability of the proposed algorithm to effectively construct a very sparse kernel density estimate with comparable accuracy to that of the full sample Parzen window density estimate

    Klasifikasi Wilayah Desa-perdesaan Dan Desa-perkotaan Wilayah Kabupaten Semarang Dengan Support Vector Machine (Svm)

    Full text link
    This research will be carry out classification based on the status of the rural and urban regions that reflect the differences in characteristics/ conditions between regions in Indonesia with Support Vector Machine (SVM) method. Classification on this issue is working by build separation functions involving the kernel function to map the input data into a higher dimensional space. Sequential Minimal Optimization (SMO) algorithms is used in the training process of data classification of rural and urban regions to get the optimal separation function (hyperplane). To determine the kernel function and parameters according to the data, grid search method combined with the leave-one-out cross-validation method is used. In the classification using SVM, accuracy is obtained, which the best value is 90% using Radial Basis Function (RBF) kernel functions with parameters C=100 dan γ=2-5

    Concentration inequalities for leave-one-out cross validation

    Full text link
    In this article we prove that estimator stability is enough to show that leave-one-out cross validation is a sound procedure, by providing concentration bounds in a general framework. In particular, we provide concentration bounds beyond Lipschitz continuity assumptions on the loss or on the estimator. In order to obtain our results, we rely on random variables with distribution satisfying the logarithmic Sobolev inequality, providing us a relatively rich class of distributions. We illustrate our method by considering several interesting examples, including linear regression, kernel density estimation, and stabilized / truncated estimators such as stabilized kernel regression

    Approximate inference of the bandwidth in multivariate kernel density estimation

    Get PDF
    Kernel density estimation is a popular and widely used non-parametric method for data-driven density estimation. Its appeal lies in its simplicity and ease of implementation, as well as its strong asymptotic results regarding its convergence to the true data distribution. However, a major difficulty is the setting of the bandwidth, particularly in high dimensions and with limited amount of data. An approximate Bayesian method is proposed, based on the Expectation–Propagation algorithm with a likelihood obtained from a leave-one-out cross validation approach. The proposed method yields an iterative procedure to approximate the posterior distribution of the inverse bandwidth. The approximate posterior can be used to estimate the model evidence for selecting the structure of the bandwidth and approach online learning. Extensive experimental validation shows that the proposed method is competitive in terms of performance with state-of-the-art plug-in methods
    corecore