1,016 research outputs found
Parametric, Nonparametric, and Semiparametric Linear Regression in Classical and Bayesian Statistical Quality Control
Statistical process control (SPC) is used in many fields to understand and monitor desired processes, such as manufacturing, public health, and network traffic. SPC is categorized into two phases; in Phase I historical data is used to inform parameter estimates for a statistical model and Phase II implements this statistical model to monitor a live ongoing process. Within both phases, profile monitoring is a method to understand the functional relationship between response and explanatory variables by estimating and tracking its parameters. In profile monitoring, control charts are often used as graphical tools to visually observe process behaviors. We construct a practitioner’s guide to provide a stepby- step application for parametric, nonparametric, and semiparametric methods in profile monitoring, creating an in-depth guideline for novice practitioners. We then consider the commonly used cumulative sum (CUSUM), multivariate CUSUM (mCUSUM), exponentially weighted moving average (EWMA), multivariate EWMA (mEWMA) charts under a Bayesian framework for monitoring respiratory disease related hospitalizations and global suicide rates with parametric, nonparametric, and semiparametric linear models
Statistical Methodologies of Functional Data Analysis for Industrial Applications
This thesis stands as one of the first attempt to connect the statistical object oriented data analysis (OODA) methodologies with the industry field. Indeed, the aim of this thesis is to develop statistical methods to tackle industrial problems through the paradigm of the OODA.
The new framework of Industry 4.0 requires factories that are equipped with sensor and advanced acquisition systems that acquire data with a high degree of complexity.
OODA can be particularly suitable to deal with this increasing complexity as it considers each statistical unit as an atom or a data object assumed to be a point in a well-defined mathematical space.
This idea allows one to deal with complex data structure by changing the resolution of the analysis. Indeed, from standard methods where the atom is represented by vector of numbers, the focus now is on methodologies where the objects of the analysis are whole complex objects.
In particular, this thesis focuses on functional data analysis (FDA), a branch of OODA that considers as the atom of the analysis functions defined on compact domains.
The cross-fertilization of FDA methods to industrial applications is developed into three parts in this dissertation.
The first part presents methodologies developed to solve specific applicative problems.
In particular, a first consistent portion of this part is focused on \textit{profile monitoring} methods applied to ship CO\textsubscript{2} emissions.
A second portion deals with the problem of predicting the mechanical properties of an additively manufactured artifact given the particle size distribution of the powder used for its production. And, a third portion copes with the cluster analysis for the quality assessment of metal sheet spot welds in the automotive industry based on observations of dynamic resistance curve.
Stimulated by these challenges, the second part of this dissertation turns towards a more methodological line that addresses the notion of \textit{interpretability} for functional data.
In particular, two new interpretable estimators of the coefficient function of the function-on-function linear regression model are proposed, which are named S-LASSO and AdaSS, respectively.
Moreover, a new method, referred to as SaS-Funclust, is presented for sparse clustering of functional data that aims to classify a sample of curves into homogeneous groups while jointly detecting the most informative portions of domain.
In the last part, two ongoing researches on FDA methods for industrial application are presented.
In particular, the first one regards the definition of a new robust nonparametric functional ANOVA method (Ro-FANOVA) to test differences among group functional means by being robust against the presence of outliers with an application to additive manufacturing. The second one sketches a new methodological framework for the real-time profile monitoring
Recommended from our members
Variable selection in single index varying coefficient models with LASSO
Single index varying coefficient model is a very attractive statistical model due to its ability to reduce dimensions and easy-of-interpretation. There are many theoretical studies and practical applications with it, but typically without features of variable selection, and no public software is available for solving it. Here we propose a new algorithm to fit the single index varying coefficient model, and to carry variable selection in the index part with LASSO. The core idea is a two-step scheme which alternates between estimating coefficient functions and selecting-and-estimating the single index. Both in simulation and in application to a Geoscience dataset, we showed that it works very well. We also presented our R package sivcm with the algorithm implemented and with ideas that can be extended beyond
An OLS-Based Method for Causal Inference in Observational Studies
Indiana University-Purdue University Indianapolis (IUPUI)Observational data are frequently used for causal inference of treatment effects
on prespecified outcomes. Several widely used causal inference methods have adopted
the method of inverse propensity score weighting (IPW) to alleviate the in
uence of
confounding. However, the IPW-type methods, including the doubly robust methods,
are prone to large variation in the estimation of causal e ects due to possible extreme
weights. In this research, we developed an ordinary least-squares (OLS)-based causal
inference method, which does not involve the inverse weighting of the individual
propensity scores.
We first considered the scenario of homogeneous treatment effect. We proposed
a two-stage estimation procedure, which leads to a model-free estimator of
average treatment effect (ATE). At the first stage, two summary scores, the propensity
and mean scores, are estimated nonparametrically using regression splines. The
targeted ATE is obtained as a plug-in estimator that has a closed form expression.
Our simulation studies showed that this model-free estimator of ATE is consistent,
asymptotically normal and has superior operational characteristics in comparison to
the widely used IPW-type methods. We then extended our method to the scenario
of heterogeneous treatment effects, by adding in an additional stage of modeling
the covariate-specific treatment effect function nonparametrically while maintaining
the model-free feature, and the simplicity of OLS-based estimation. The estimated covariate-specific function serves as an intermediate step in the estimation of ATE
and thus can be utilized to study the treatment effect heterogeneity.
We discussed ways of using advanced machine learning techniques in the proposed
method to accommodate high dimensional covariates. We applied the proposed
method to a case study evaluating the effect of early combination of biologic &
non-biologic disease-modifying antirheumatic drugs (DMARDs) compared to step-up
treatment plan in children with newly onset of juvenile idiopathic arthritis disease
(JIA). The proposed method gives strong evidence of significant effect of early combination
at 0:05 level. On average early aggressive use of biologic DMARDs leads to
around 1:2 to 1:7 more reduction in clinical juvenile disease activity score at 6-month
than the step-up plan for treating JIA
Methods for non-proportional hazards in clinical trials: A systematic review
For the analysis of time-to-event data, frequently used methods such as the
log-rank test or the Cox proportional hazards model are based on the
proportional hazards assumption, which is often debatable. Although a wide
range of parametric and non-parametric methods for non-proportional hazards
(NPH) has been proposed, there is no consensus on the best approaches. To close
this gap, we conducted a systematic literature search to identify statistical
methods and software appropriate under NPH. Our literature search identified
907 abstracts, out of which we included 211 articles, mostly methodological
ones. Review articles and applications were less frequently identified. The
articles discuss effect measures, effect estimation and regression approaches,
hypothesis tests, and sample size calculation approaches, which are often
tailored to specific NPH situations. Using a unified notation, we provide an
overview of methods available. Furthermore, we derive some guidance from the
identified articles. We summarized the contents from the literature review in a
concise way in the main text and provide more detailed explanations in the
supplement (page 29)
Studies on semiparametric spatial regression models
In this thesis, I study estimations and inferences for semiparametric spatial regression models and generalized geoadditive models (GgAMs). I use the bivariate penalized spline over triangulation (BPST) method in these models to incorporate the spatial information when it is available. There are three topics in the thesis.
In the first topic, we try to develop a sparse-partially linear spatial regression model (-PLSM) using a doubly penalized estimator to select and estimate the most significant linear covariates. We apply BPST to approximate a bivariate function over a spatial domain. A standard error formula is constructed to estimate the standard deviation of the estimators, which is tested by simulation studies. We show the consistency of our sparse estimator with asymptotic normality. An application to United States mortality illustrates improvements in estimation and prediction from the use of our estimator relative to other methods.
In the second topic, a generalized version of PLSM (GPLSM) is developed to allow a nonlinear link function relating the covariates to the mean of the response variables. This extension allows our method to deal with non-continuous response variables, such as count and binary variables. The iteratively reweighted least square (IRLS) algorithm helps to achieve the computational efficiency of our estimator. The consistency of the proposed estimator is proved with a convergence rate. A standard error formula is developed to construct confidence intervals for the linear estimator. A crash frequency real data analysis demonstrates the accuracy in estimation and prediction for GPLSM.
In the last topic, I build an \textsf{R} package, \textbf{GgAM}, which integrates model structure identification process, estimation methods, statistical inference tools of GgAMs together. We develop a semiparametric version of GgAM by adding a linear part into nonparametric GgAMs. This model shares the benefits from univariate splines, bivariate splines and local polynomials. A penalized quasi-likelihood estimator is firstly derived through the IRLS algorithm and then a spline-backfitted local polynomial estimator is obtained.
We propose a standard error formula for the parametric estimator in the model as well. Simultaneous confidence bands are developed to measure the accuracy of the univariate spline estimators.
A model structure identification process is contained before model fitting to better identify the function form (linearity/nonlinearity) of the continuous covariates.
Simulation studies are conducted to show the estimation accuracy and predictive power of our GgAM. The datasets of Georgia education attainment, Sydney housing prices, and Florida crash frequency are included to show the convenient and flexible uses of functions in the \textbf{GgAM} package.
In this thesis, I aim to develop computational algorithms to get accurate estimators and propose efficient inference tools to better interpret the results for GgAMs. These tools can be widely used in social, economic, and geographic applications with spatial data to draw perceptive conclusions
Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain
466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts
MODELING OF QUALITY PROFILE DATA WITH APPLICATION IN MANUFACTURING AND BIOMEDICAL ENGINEERING
The quality of the output of a complex system is often recorded as multidimensional profile data with panel structure. In such structure, the quality of each individual in the output is measured repeatedly based on time or other variables. In this dissertation, the quality profile data are modeled to address two types of problems: (a) to explore the underlying relationship between the parameter of interest in the complex system and the resulting quality under the condition that the principal mechanism is not fully known and (b) to quantify the uncertainties among the output. For the first type of problem, we consider a constrained semiparametric varying coefficient model. The system parameter of interest is treated as a covariate whose effect upon the resulting quality is modeled nonparametrically as a function of time. Any existing physicochemical knowledge related to other factors in the system that affect the resulting output quality is modeled parametrically as an additive term in the model. In the situation that expert knowledge about the effect of the parameter is available, some constraints can be incorporated in the model such that the estimated effect aligns with the given knowledge.
For the second type of problem, mixed-effect model is developed to quantify the uncertainties among output using random effects. These random effects can be utilized for anomaly detection or for variation quantification where deviation among individuals is of interest depending on the context of the data. Three case studies from manufacturing and biomedical engineering domains are presented in the dissertation where the above two types of problems are discussed
Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain
466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts
- …