10 research outputs found
A New SVDD-Based Multivariate Non-parametric Process Capability Index
Process capability index (PCI) is a commonly used statistic to measure
ability of a process to operate within the given specifications or to produce
products which meet the required quality specifications. PCI can be univariate
or multivariate depending upon the number of process specifications or quality
characteristics of interest. Most PCIs make distributional assumptions which
are often unrealistic in practice.
This paper proposes a new multivariate non-parametric process capability
index. This index can be used when distribution of the process or quality
parameters is either unknown or does not follow commonly used distributions
such as multivariate normal
Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection
In recent years, there have been many practical applications of anomaly
detection such as in predictive maintenance, detection of credit fraud, network
intrusion, and system failure. The goal of anomaly detection is to identify in
the test data anomalous behaviors that are either rare or unseen in the
training data. This is a common goal in predictive maintenance, which aims to
forecast the imminent faults of an appliance given abundant samples of normal
behaviors. Local outlier factor (LOF) is one of the state-of-the-art models
used for anomaly detection, but the predictive performance of LOF depends
greatly on the selection of hyperparameters. In this paper, we propose a novel,
heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model
that uses the proposed method shows good predictive performance in both
simulations and real data sets.Comment: 15 pages, 5 figure
Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel
Support vector data description (SVDD) is a machine learning technique that
is used for single-class classification and outlier detection. The idea of SVDD
is to find a set of support vectors that defines a boundary around data. When
dealing with online or large data, existing batch SVDD methods have to be rerun
in each iteration. We propose an incremental learning algorithm for SVDD that
uses the Gaussian kernel. This algorithm builds on the observation that all
support vectors on the boundary have the same distance to the center of sphere
in a higher-dimensional feature space as mapped by the Gaussian kernel
function. Each iteration involves only the existing support vectors and the new
data point. Moreover, the algorithm is based solely on matrix manipulations;
the support vectors and their corresponding Lagrange multiplier 's
are automatically selected and determined in each iteration. It can be seen
that the complexity of our algorithm in each iteration is only , where
is the number of support vectors. Experimental results on some real data
sets indicate that FISVDD demonstrates significant gains in efficiency with
almost no loss in either outlier detection accuracy or objective function
value.Comment: 18 pages, 1 table, 4 figure
Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description
Support Vector Data Description (SVDD) is a machine-learning technique used
for single class classification and outlier detection. SVDD formulation with
kernel function provides a flexible boundary around data. The value of kernel
function parameters affects the nature of the data boundary. For example, it is
observed that with a Gaussian kernel, as the value of kernel bandwidth is
lowered, the data boundary changes from spherical to wiggly. The spherical data
boundary leads to underfitting, and an extremely wiggly data boundary leads to
overfitting. In this paper, we propose empirical criterion to obtain good
values of the Gaussian kernel bandwidth parameter. This criterion provides a
smooth boundary that captures the essential geometric features of the data