5,609 research outputs found
Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description
Support Vector Data Description (SVDD) is a machine-learning technique used
for single class classification and outlier detection. SVDD formulation with
kernel function provides a flexible boundary around data. The value of kernel
function parameters affects the nature of the data boundary. For example, it is
observed that with a Gaussian kernel, as the value of kernel bandwidth is
lowered, the data boundary changes from spherical to wiggly. The spherical data
boundary leads to underfitting, and an extremely wiggly data boundary leads to
overfitting. In this paper, we propose empirical criterion to obtain good
values of the Gaussian kernel bandwidth parameter. This criterion provides a
smooth boundary that captures the essential geometric features of the data
A new bandwidth selection criterion for using SVDD to analyze hyperspectral data
This paper presents a method for hyperspectral image classification that uses
support vector data description (SVDD) with the Gaussian kernel function. SVDD
has been a popular machine learning technique for single-class classification,
but selecting the proper Gaussian kernel bandwidth to achieve the best
classification performance is always a challenging problem. This paper proposes
a new automatic, unsupervised Gaussian kernel bandwidth selection approach
which is used with a multiclass SVDD classification scheme. The performance of
the multiclass SVDD classification scheme is evaluated on three frequently used
hyperspectral data sets, and preliminary results show that the proposed method
can achieve better performance than published results on these data sets
A New SVDD-Based Multivariate Non-parametric Process Capability Index
Process capability index (PCI) is a commonly used statistic to measure
ability of a process to operate within the given specifications or to produce
products which meet the required quality specifications. PCI can be univariate
or multivariate depending upon the number of process specifications or quality
characteristics of interest. Most PCIs make distributional assumptions which
are often unrealistic in practice.
This paper proposes a new multivariate non-parametric process capability
index. This index can be used when distribution of the process or quality
parameters is either unknown or does not follow commonly used distributions
such as multivariate normal
A Bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation.
Multivariate kernel regression is an important tool for investigating the relationship between a response and a set of explanatory variables. It is generally accepted that the performance of a kernel regression estimator largely depends on the choice of bandwidth rather than the kernel function. This nonparametric technique has been employed in a number of empirical studies including the state-price density estimation pioneered by Aït-Sahalia and Lo (1998). However, the widespread usefulness of multivariate kernel regression has been limited by the difficulty in computing a data-driven bandwidth. In this paper, we present a Bayesian approach to bandwidth selection for multivariate kernel regression. A Markov chain Monte Carlo algorithm is presented to sample the bandwidth vector and other parameters in a multivariate kernel regression model. A Monte Carlo study shows that the proposed bandwidth selector is more accurate than the rule-of-thumb bandwidth selector known as the normal reference rule according to Scott (1992) and Bowman and Azzalini (1997). The proposed bandwidth selection algorithm is applied to a multivariate kernel regression model that is often used to estimate the state-price density of Arrow-Debreu securities. When applying the proposed method to the S&P 500 index options and the DAX index options, we find that for short-maturity options, the proposed Bayesian bandwidth selector produces an obviously different state-price density from the one produced by using a subjective bandwidth selector discussed in Aït-Sahalia and Lo (1998).Black-Scholes formula, Likelihood, Markov chain Monte Carlo, Posterior density.
Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel
Support vector data description (SVDD) is a machine learning technique that
is used for single-class classification and outlier detection. The idea of SVDD
is to find a set of support vectors that defines a boundary around data. When
dealing with online or large data, existing batch SVDD methods have to be rerun
in each iteration. We propose an incremental learning algorithm for SVDD that
uses the Gaussian kernel. This algorithm builds on the observation that all
support vectors on the boundary have the same distance to the center of sphere
in a higher-dimensional feature space as mapped by the Gaussian kernel
function. Each iteration involves only the existing support vectors and the new
data point. Moreover, the algorithm is based solely on matrix manipulations;
the support vectors and their corresponding Lagrange multiplier 's
are automatically selected and determined in each iteration. It can be seen
that the complexity of our algorithm in each iteration is only , where
is the number of support vectors. Experimental results on some real data
sets indicate that FISVDD demonstrates significant gains in efficiency with
almost no loss in either outlier detection accuracy or objective function
value.Comment: 18 pages, 1 table, 4 figure
The non-Gaussianity of the cosmic shear likelihood - or: How odd is the Chandra Deep Field South?
(abridged) We study the validity of the approximation of a Gaussian cosmic
shear likelihood. We estimate the true likelihood for a fiducial cosmological
model from a large set of ray-tracing simulations and investigate the impact of
non-Gaussianity on cosmological parameter estimation. We investigate how odd
the recently reported very low value of really is as derived from
the \textit{Chandra} Deep Field South (CDFS) using cosmic shear by taking the
non-Gaussianity of the likelihood into account as well as the possibility of
biases coming from the way the CDFS was selected.
We find that the cosmic shear likelihood is significantly non-Gaussian. This
leads to both a shift of the maximum of the posterior distribution and a
significantly smaller credible region compared to the Gaussian case. We
re-analyse the CDFS cosmic shear data using the non-Gaussian likelihood.
Assuming that the CDFS is a random pointing, we find
for fixed . In a
WMAP5-like cosmology, a value equal to or lower than this would be expected in
of the times. Taking biases into account arising from the way the
CDFS was selected, which we model as being dependent on the number of haloes in
the CDFS, we obtain . Combining the CDFS data
with the parameter constraints from WMAP5 yields and for a flat
universe.Comment: 18 pages, 16 figures, accepted for publication in A&A; New Bayesian
treatment of field selection bia
Local Multiplicative Bias Correction for Asymmetric Kernel Density Estimators
We consider semiparametric asymmetric kernel density estimators when the unknown density has support on [0, ¥). We provide a unifying framework which contains asymmetric kernel versions of several semiparametric density estimators considered previously in the literature. This framework allows us to use popular parametric models in a nonparametric fashion and yields estimators which are robust to misspecification. We further develop a specification test to determine if a density belongs to a particular parametric family. The proposed estimators outperform rival non- and semiparametric estimators in finite samples and are simple to implement. We provide applications to loss data from a large Swiss health insurer and Brazilian income data.semiparametric density estimation; asymmetric kernel; income distribution; loss distribution; health insurance; specification testing
- …