5,609 research outputs found

    Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description

    Full text link
    Support Vector Data Description (SVDD) is a machine-learning technique used for single class classification and outlier detection. SVDD formulation with kernel function provides a flexible boundary around data. The value of kernel function parameters affects the nature of the data boundary. For example, it is observed that with a Gaussian kernel, as the value of kernel bandwidth is lowered, the data boundary changes from spherical to wiggly. The spherical data boundary leads to underfitting, and an extremely wiggly data boundary leads to overfitting. In this paper, we propose empirical criterion to obtain good values of the Gaussian kernel bandwidth parameter. This criterion provides a smooth boundary that captures the essential geometric features of the data

    A new bandwidth selection criterion for using SVDD to analyze hyperspectral data

    Full text link
    This paper presents a method for hyperspectral image classification that uses support vector data description (SVDD) with the Gaussian kernel function. SVDD has been a popular machine learning technique for single-class classification, but selecting the proper Gaussian kernel bandwidth to achieve the best classification performance is always a challenging problem. This paper proposes a new automatic, unsupervised Gaussian kernel bandwidth selection approach which is used with a multiclass SVDD classification scheme. The performance of the multiclass SVDD classification scheme is evaluated on three frequently used hyperspectral data sets, and preliminary results show that the proposed method can achieve better performance than published results on these data sets

    A New SVDD-Based Multivariate Non-parametric Process Capability Index

    Full text link
    Process capability index (PCI) is a commonly used statistic to measure ability of a process to operate within the given specifications or to produce products which meet the required quality specifications. PCI can be univariate or multivariate depending upon the number of process specifications or quality characteristics of interest. Most PCIs make distributional assumptions which are often unrealistic in practice. This paper proposes a new multivariate non-parametric process capability index. This index can be used when distribution of the process or quality parameters is either unknown or does not follow commonly used distributions such as multivariate normal

    A Bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation.

    Get PDF
    Multivariate kernel regression is an important tool for investigating the relationship between a response and a set of explanatory variables. It is generally accepted that the performance of a kernel regression estimator largely depends on the choice of bandwidth rather than the kernel function. This nonparametric technique has been employed in a number of empirical studies including the state-price density estimation pioneered by Aït-Sahalia and Lo (1998). However, the widespread usefulness of multivariate kernel regression has been limited by the difficulty in computing a data-driven bandwidth. In this paper, we present a Bayesian approach to bandwidth selection for multivariate kernel regression. A Markov chain Monte Carlo algorithm is presented to sample the bandwidth vector and other parameters in a multivariate kernel regression model. A Monte Carlo study shows that the proposed bandwidth selector is more accurate than the rule-of-thumb bandwidth selector known as the normal reference rule according to Scott (1992) and Bowman and Azzalini (1997). The proposed bandwidth selection algorithm is applied to a multivariate kernel regression model that is often used to estimate the state-price density of Arrow-Debreu securities. When applying the proposed method to the S&P 500 index options and the DAX index options, we find that for short-maturity options, the proposed Bayesian bandwidth selector produces an obviously different state-price density from the one produced by using a subjective bandwidth selector discussed in Aït-Sahalia and Lo (1998).Black-Scholes formula, Likelihood, Markov chain Monte Carlo, Posterior density.

    Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

    Full text link
    Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration involves only the existing support vectors and the new data point. Moreover, the algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier αi\alpha_i's are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only O(k2)O(k^2), where kk is the number of support vectors. Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.Comment: 18 pages, 1 table, 4 figure

    The non-Gaussianity of the cosmic shear likelihood - or: How odd is the Chandra Deep Field South?

    Full text link
    (abridged) We study the validity of the approximation of a Gaussian cosmic shear likelihood. We estimate the true likelihood for a fiducial cosmological model from a large set of ray-tracing simulations and investigate the impact of non-Gaussianity on cosmological parameter estimation. We investigate how odd the recently reported very low value of σ8\sigma_8 really is as derived from the \textit{Chandra} Deep Field South (CDFS) using cosmic shear by taking the non-Gaussianity of the likelihood into account as well as the possibility of biases coming from the way the CDFS was selected. We find that the cosmic shear likelihood is significantly non-Gaussian. This leads to both a shift of the maximum of the posterior distribution and a significantly smaller credible region compared to the Gaussian case. We re-analyse the CDFS cosmic shear data using the non-Gaussian likelihood. Assuming that the CDFS is a random pointing, we find σ8=0.68−0.16+0.09\sigma_8=0.68_{-0.16}^{+0.09} for fixed Ωm=0.25\Omega_{\rm m}=0.25. In a WMAP5-like cosmology, a value equal to or lower than this would be expected in ≈5\approx 5% of the times. Taking biases into account arising from the way the CDFS was selected, which we model as being dependent on the number of haloes in the CDFS, we obtain σ8=0.71−0.15+0.10\sigma_8 = 0.71^{+0.10}_{-0.15}. Combining the CDFS data with the parameter constraints from WMAP5 yields Ωm=0.26−0.02+0.03\Omega_{\rm m} = 0.26^{+0.03}_{-0.02} and σ8=0.79−0.03+0.04\sigma_8 = 0.79^{+0.04}_{-0.03} for a flat universe.Comment: 18 pages, 16 figures, accepted for publication in A&A; New Bayesian treatment of field selection bia

    Local Multiplicative Bias Correction for Asymmetric Kernel Density Estimators

    Get PDF
    We consider semiparametric asymmetric kernel density estimators when the unknown density has support on [0, ¥). We provide a unifying framework which contains asymmetric kernel versions of several semiparametric density estimators considered previously in the literature. This framework allows us to use popular parametric models in a nonparametric fashion and yields estimators which are robust to misspecification. We further develop a specification test to determine if a density belongs to a particular parametric family. The proposed estimators outperform rival non- and semiparametric estimators in finite samples and are simple to implement. We provide applications to loss data from a large Swiss health insurer and Brazilian income data.semiparametric density estimation; asymmetric kernel; income distribution; loss distribution; health insurance; specification testing
    • …