17 research outputs found

    Nonlinear profile monitoring using spline functions

    Get PDF
    [[abstract]]In this study, two new integrated control charts, named T2-MAE chart and MS-MAE chart, are introduced for monitoring the quality of a process when the mathematical form of nonlinear profile model for quality measure is complicated and unable to be specified. The T2-MAE chart is composed of two memoryless-type control charts and the MS-MAE chart is composed of one memory-type and one memoryless-type control charts. The normality assumption of error terms in the nonlinear profile model for both proposed control charts are extended to a generalized model. An intensive simulation study is conducted to evaluate the performance of the T2-MAE and MS-MAE charts. Simulation results show that the MS-MAE chart outperforms the T2-MAE chart with less false alarms during the Phase I monitoring. Moreover, the MS-MAE chart is sensitive to different shifts on the model parameters and profile shape during the Phase II monitoring. An example about the vertical density profile is used for illustration.[[notice]]補正完

    Adaptive estimation and change detection of correlation and quantiles for evolving data streams

    Get PDF
    Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics. The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces

    Advanced Data Analysis - Lecture Notes

    Get PDF
    Lecture notes for Advanced Data Analysis (ADA1 Stat 427/527 and ADA2 Stat 428/528), Department of Mathematics and Statistics, University of New Mexico, Fall 2016-Spring 2017. Additional material including RMarkdown templates for in-class and homework exercises, datasets, R code, and video lectures are available on the course websites: https://statacumen.com/teaching/ada1 and https://statacumen.com/teaching/ada2 . Contents I ADA1: Software 0 Introduction to R, Rstudio, and ggplot II ADA1: Summaries and displays, and one-, two-, and many-way tests of means 1 Summarizing and Displaying Data 2 Estimation in One-Sample Problems 3 Two-Sample Inferences 4 Checking Assumptions 5 One-Way Analysis of Variance III ADA1: Nonparametric, categorical, and regression methods 6 Nonparametric Methods 7 Categorical Data Analysis 8 Correlation and Regression IV ADA1: Additional topics 9 Introduction to the Bootstrap 10 Power and Sample size 11 Data Cleaning V ADA2: Review of ADA1 1 R statistical software and review VI ADA2: Introduction to multiple regression and model selection 2 Introduction to Multiple Linear Regression 3 A Taste of Model Selection for Multiple Regression VII ADA2: Experimental design and observational studies 4 One Factor Designs and Extensions 5 Paired Experiments and Randomized Block Experiments 6 A Short Discussion of Observational Studies VIII ADA2: ANCOVA and logistic regression 7 Analysis of Covariance: Comparing Regression Lines 8 Polynomial Regression 9 Discussion of Response Models with Factors and Predictors 10 Automated Model Selection for Multiple Regression 11 Logistic Regression IX ADA2: Multivariate Methods 12 An Introduction to Multivariate Methods 13 Principal Component Analysis 14 Cluster Analysis 15 Multivariate Analysis of Variance 16 Discriminant Analysis 17 Classificationhttps://digitalrepository.unm.edu/unm_oer/1002/thumbnail.jp

    Vol. 16, No. 1 (Full Issue)

    Get PDF

    Hybrid Bootstrap for Mapping Quantitative Trait Loci and Change Point Problems.

    Full text link
    The hybrid bootstrap uses resampling ideas to extend the duality approach to interval estimation for a parameter of interest when there are nuisance parameters. The confidence region constructed by the hybrid bootstrap may perform much better than the ordinary bootstrap region in situations where the data provide substantial information about the nuisance parameter, but limited information about the parameter of interest. After describing the approach, three applications will be considered. The first concerns estimating the location of a quantitative trait loci on a strand of DNA with data from a back-cross experiment. The results of some large simulation studies to demonstrate the performance of hybrid bootstrap are reported. The analysis of a real data set of rice tiller number is then presented. The second application concerns change point problems. The hybrid confidence region for a post change mean is considered after a change is detected by a Shewhart control chart in a sequence of independent normal variables. The hybrid regions are constructed in ways using likelihood ratio and Bayesian statistics. Their performance are also compared in the simulation study. The last application concerns a signal plus Poisson model of interest in high energy physics. Surprisingly, for this example the method is inconsistent--coverage probabilities to not converge to the nominal value as information about the background rate increases.Ph.D.StatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61577/1/hksun_1.pd

    Information Geometry

    Get PDF
    This Special Issue of the journal Entropy, titled “Information Geometry I”, contains a collection of 17 papers concerning the foundations and applications of information geometry. Based on a geometrical interpretation of probability, information geometry has become a rich mathematical field employing the methods of differential geometry. It has numerous applications to data science, physics, and neuroscience. Presenting original research, yet written in an accessible, tutorial style, this collection of papers will be useful for scientists who are new to the field, while providing an excellent reference for the more experienced researcher. Several papers are written by authorities in the field, and topics cover the foundations of information geometry, as well as applications to statistics, Bayesian inference, machine learning, complex systems, physics, and neuroscience

    Investigation, Improvement, and Extension of Techniques for Measurement System Analysis

    Get PDF
    Critical manufacturing decisions are made based on data obtained by randomly sampling production units. The effectiveness of these decisions depends on the accuracy of the data, which in turn depends on the accuracy and precision of the measurement system. A measurement system is comprised of two primary components - equipment or measuring devices and appraisers or personnel who use these devices. Each component adds to the variability of the data. The process of analyzing this variation and estimating the contribution of various components of the measurement system toward this variation is known as measurement system analysis (MSA).Automotive Industry Action Group (AIAG) has been at the forefront of MSA research. Through this dissertation, (i)improvements have been made in some of the existing estimates of variance components (part variation); (ii) new variance components have been identified and quantified (within-appraiser variation); (iii) new approaches have been suggested (using multiple measuring devices); (iv) the estimates of variance components when measurement is destructive have been improved and adapted for chemical and process industries; and (v) various measurement system acceptability criteria have been identified and their relative merits and demerits have been evaluated under varying process capability conditions. Monte-Carlo simulation is used to verify the techniques developed as a part of this research. Within-appraiser variation is found to be confounded with equipment variation. Lower bound for the former and upper bound for the latter are estimated using two different approaches. The use of multiple equipment in the MSA study allows the estimation of variation among equipment which can be a significant component of the observed variation, and yet, is ignored in the current state of MSA. The traditional estimate of part-to-part variation is corrected. Concurrent evaluation of various measurement system acceptability criteria revealed that some of these are in conflict with each other. Guidelines for the use of these criteria are developed. The difference between two such criteria---discrimination ratio and number of distinct categories, are outlined. The current MSA approach for destructive testing scenarios is significantly improved and new estimates for measurement variation are derived.All simulation tests were done on a MATLAB-based software developed as part of this research. This software allows the simulation and analysis of various measurement systems under widely varying conditions.Industrial Engineering & Managemen

    Some applications of higher-order hidden Markov models in the exotic commodity markets

    Get PDF
    The liberalisation of regional and global commodity markets over the last several decades resulted in certain commodity price behaviours that require new modelling and estimation approaches. Such new approaches have important implications to the valuation and utilisation of commodity derivatives. Derivatives are becoming increasingly crucial for market participants in hedging their exposure to volatile price swings and in managing risks associated with derivative trading. The modelling of commodity-based variables is an integral part of risk management and optimal-investment strategies for commodity-linked portfolios. The characteristics of commodity price evolution cannot be captured sufficiently by one-state driven models even with the inclusion of multiple factors. This inspires the adoption of regime-switching methods to rectify the one-state multi-factor modelling inadequacies. In this research, we aim to employ higher-order hidden Markov models (HOHMMs) in order to take advantage of the latent information in the observed process recorded in the past. This hugely enhances and complements the regime-switching features of our approach in describing certain variables that virtually determine the value of some commodity derivatives such as contracts dependent on temperature, electricity spot price, and fish-price dynamics. Our push for the utility of the change-of-probability-measure technique facilitates the derivation of recursive filtering algorithms. This then establishes a self-tuning dynamic estimation procedure. Both the data-fitting and forecasting performances of various model settings are investigated. This research work emerged from four related projects detailed as follows. (i) We start with an HMM to model the behaviour of daily average temperatures (DATs) geared towards the analysis of weather derivatives. (ii) The model in (i) is extended naturally by showcasing the capacity of an HOHMM-based approach to simultaneously describe the DATs’ salient properties of mean reversion, seasonality, memory and stochasticity. (iii) An HOHMM-driven jump process augments the HOHMM-based de-seasonalised temperature process to capture price spikes, and the ensuing filtering algorithms under this modelling framework are constructed to provide optimal parameter estimates. (iv) Finally, a multi-dimensional HOHMM-modulated set up is built for futures price-curve dynamics pertinent to financial product valuation and risk management in the aquaculture sector. We examine the performance of this new modelling set up by considering goodness-of-fit and out-of-sample forecasting metrics with a detailed numerical demonstration using a multivariate dataset compiled by the Fish Pool ASA. This research offers a collection of more flexible stochastic modelling approaches for pricing and risk analysis of certain commodity derivatives on weather, electricity and fish prices. The novelty of our techniques is the powerful capability to automate the parameter estimation. Consequently, we contribute to the development of financial tools that aid in selecting the appropriate and optimal model on the basis of some information criteria and within current technological advancements in which continuous flow of observed data are now readily accessible in real time
    corecore