102 research outputs found
Distribution-free cumulative sum control charts using bootstrap-based control limits
This paper deals with phase II, univariate, statistical process control when
a set of in-control data is available, and when both the in-control and
out-of-control distributions of the process are unknown. Existing process
control techniques typically require substantial knowledge about the in-control
and out-of-control distributions of the process, which is often difficult to
obtain in practice. We propose (a) using a sequence of control limits for the
cumulative sum (CUSUM) control charts, where the control limits are determined
by the conditional distribution of the CUSUM statistic given the last time it
was zero, and (b) estimating the control limits by bootstrap. Traditionally,
the CUSUM control chart uses a single control limit, which is obtained under
the assumption that the in-control and out-of-control distributions of the
process are Normal. When the normality assumption is not valid, which is often
true in applications, the actual in-control average run length, defined to be
the expected time duration before the control chart signals a process change,
is quite different from the nominal in-control average run length. This
limitation is mostly eliminated in the proposed procedure, which is
distribution-free and robust against different choices of the in-control and
out-of-control distributions.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS197 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Nonparametric (distribution-free) control charts : an updated overview and some results
Control charts that are based on assumption(s) of a specific form for the underlying process distribution are referred to as parametric control charts. There are many applications where there is insufficient information to justify such assumption(s) and, consequently, control charting techniques with a minimal set of distributional assumption requirements are in high demand. To this end, nonparametric or distribution-free control charts have been proposed in recent years. The charts have stable in-control properties, are robust against outliers and can be surprisingly efficient in comparison with their parametric counterparts. Chakraborti and some of his colleagues provided review papers on nonparametric control charts in 2001, 2007 and 2011, respectively. These papers have been received with considerable interest and attention by the community. However, the literature on nonparametric statistical process/quality control/monitoring has grown exponentially and because of this rapid growth, an update is deemed necessary. In this article, we bring these reviews forward to 2017, discussing some of the latest developments in the area. Moreover, unlike the past reviews, which did not include the multivariate charts, here we review both univariate and multivariate nonparametric control charts. We end with some concluding remarks.https://www.tandfonline.com/loi/lqen20hj2020Science, Mathematics and Technology Educatio
Nonparametric monitoring of sunspot number observations: a case study
Solar activity is an important driver of long-term climate trends and must be
accounted for in climate models. Unfortunately, direct measurements of this
quantity over long periods do not exist. The only observation related to solar
activity whose records reach back to the seventeenth century are sunspots.
Surprisingly, determining the number of sunspots consistently over time has
remained until today a challenging statistical problem. It arises from the need
of consolidating data from multiple observing stations around the world in a
context of low signal-to-noise ratios, non-stationarity, missing data,
non-standard distributions and many kinds of errors. The data from some
stations experience therefore severe and various deviations over time. In this
paper, we propose the first systematic and thorough statistical approach for
monitoring these complex and important series. It consists of three steps
essential for successful treatment of the data: smoothing on multiple
timescales, monitoring using block bootstrap calibrated CUSUM charts and
classifying of out-of-control situations by support vector techniques. This
approach allows us to detect a wide range of anomalies (such as sudden jumps or
more progressive drifts), unseen in previous analyses. It helps us to identify
the causes of major deviations, which are often observer or equipment related.
Their detection and identification will contribute to improve future
observations. Their elimination or correction in past data will lead to a more
precise reconstruction of the world reference index for solar activity: the
International Sunspot Number.Comment: 27 pages (without appendices), 6 figure
Nonparametric signedârank control charts with variable sampling intervals
Variable sampling interval (VSI) charts have been proposed in the literature for normal theory (parametric) control charts and are known to provide performance enhancements. In the VSI setting, the time between monitored samples is allowed to vary depending on what is observed in the current sample. Nonparametric (distributionâfree) control charts have recently come to play an important role in statistical process control and monitoring. In this paper a nonparametric Shewhartâtype VSI control chart is considered for detecting changes in a specified location parameter. The proposed chart is based on the Wilcoxon signedârank statistic and is called the VSI signedârank chart. The VSI signedârank chart is compared with an existing fixed sampling interval signedârank chart, the parametric VSI Xâchart, and the nonparametric VSI sign chart. Results show that the VSI signedârank chart often performs favourably and should be used.The South African Research Chairs Initiative at the University of Pretoria and by the Department of Information Systems, Statistics and Management Science, University of Alabama. Marien Graham's research was also supported by the National Research Foundation (Thuthuka programme: TTK14061168807; grant number: 94102), SARCHI Award to the third author from the National Research Foundation.http://wileyonlinelibrary.com/journal/qre2018-12-21hj2018Statistic
Real-Time Machine Learning for Quickest Detection
Safety-critical Cyber-Physical Systems (CPS) require real-time machine learning for control and decision making. One promising solution is to use deep learning to discover useful patterns for event detection from heterogeneous data. However, deep learning algorithms encounter challenges in CPS with assurability requirements: 1) Decision explainability, 2) Real-time and quickest event detection, and 3) Time-eficient incremental learning.
To address these obstacles, I developed a real-time Machine Learning Framework for Quickest Detection (MLQD). To be specific, I first propose the zero-bias neural network, which removes decision bias and preferabilities from regular neural networks and provides an interpretable decision process. Second, I discover the latent space characteristic of the zero-bias neural network and the method to mathematically convert a Deep Neural Network (DNN) classifier into a performance-assured binary abnormality detector. In this way, I can seamlessly integrate the deep neural networks\u27 data processing capability with Quickest Detection (QD) and provide real-time sequential event detection paradigm. Thirdly, after discovering that a critical factor that impedes the incremental learning of neural networks is the concept interference (confusion) in latent space, and I prove that to minimize interference, the concept representation vectors (class fingerprints) within the latent space need to be organized orthogonally and I invent a new incremental learning strategy using the findings, I facilitate deep neural networks in the CPS to evolve eficiently without retraining. All my algorithms are evaluated on real-world applications, ADS-B (Automatic Dependent Surveillance Broadcasting) signal identification, and spoofing detection in the aviation communication system. Finally,
I discuss the current trends in MLQD and conclude this dissertation by presenting the future research directions and applications.
As a summary, the innovations of this dissertation are as follows: i) I propose the zerobias neural network, which provides transparent latent space characteristics, I apply it to solve the wireless device identification problem. ii) I discover and prove the orthogonal memory organization mechanism in artificial neural networks and apply this mechanism in time-efficient incremental learning. iii) I discover and mathematically prove the converging point theorem, with which we can predict the latent space topological characteristics and estimate the topological maturity of neural networks. iv) I bridge the gap between machine learning and quickest detection with assurable performance
Robust Statistical Inference Through the Lens of Optimization
Statistical signal processing and hypothesis testing are fundamental problems in modern data science and engineering applications. This thesis mainly focuses on developing new theories and algorithms for three research problems in the area of robust statistical inference. The first problem we study is sequential change detection. We consider the subspace change for the covariance matrix of high-dimensional data sequences, which is a fundamental problem since subspace structure is commonly used for modeling high-dimensional data. We also consider a non-parametric setting that can be useful when the data distributions cannot be easily represented by simple parametric families, and the weighted L2 divergence is proposed to detect the change. The second problem we study is data-driven robust hypothesis testing when the true data-generating distributions are all unknown and we only have access to a limited number of training samples. A strong duality result is proved and used to find the robust optimal test by convex optimization. The third problem is parameter recovery for spatio-temporal models by solving variational inequalities, with an application example in modeling crime events.Ph.D
A performance analysis of multivariate nonparametric control charts
Robust and efficient multivariate control charts are not common in literature. This
report explores the versatility of the few distribution-free, nonparametric multivariate
Statistical Process Control (MSPC) charts suitable for average run length (ARL)
analysis. Current datasets are becoming increasingly complex, large, and less likely
to follow distributional properties required for traditional parametric statistics, a fact
especially true for a multivariate setting. The purpose of our study is to compare
the newest available methods, not previously compared with one another in cases and
data structures not yet explored. Due to the versatility and robustness of the types of
data these methods can accommodate, finding real world applications is trivial. The
five methods applied here are able to exploit different types of changes to the structure
of a distribution, rather than simply detect a mean shift. These methods have similar
features, able to avoid lengthy data-gathering steps, and applicable in short-run and
start up situations. By establishing cut-off values simultaneously based on input observations,
rather than beforehand, the methods are applying data-dependent control
limits which shows their truly distribution-free property. Some of the current areas
of improvement continue to be on creating more computationally efficient algorithms
for these methods
On Rank Energy Statistics via Optimal Transport: Continuity, Convergence, and Change Point Detection
This paper considers the use of recently proposed optimal transport-based
multivariate test statistics, namely rank energy and its variant the soft rank
energy derived from entropically regularized optimal transport, for the
unsupervised nonparametric change point detection (CPD) problem. We show that
the soft rank energy enjoys both fast rates of statistical convergence and
robust continuity properties which lead to strong performance on real datasets.
Our theoretical analyses remove the need for resampling and out-of-sample
extensions previously required to obtain such rates. In contrast the rank
energy suffers from the curse of dimensionality in statistical estimation and
moreover can signal a change point from arbitrarily small perturbations, which
leads to a high rate of false alarms in CPD. Additionally, under mild
regularity conditions, we quantify the discrepancy between soft rank energy and
rank energy in terms of the regularization parameter. Finally, we show our
approach performs favorably in numerical experiments compared to several other
optimal transport-based methods as well as maximum mean discrepancy.Comment: 36 pages, 5 figure
- âŠ