265 research outputs found

    Nonparametric predictive inference for future order statistics

    Get PDF
    Nonparametric predictive inference (NPI) has been developed for a range of data types, and for a variety of applications and problems in statistics. In this thesis, further theory will be developed on NPI for multiple future observations, with attention to order statistics. The present thesis consists of three main, related contributions. First, new probabilistic theory is presented on NPI for future order statistics; additionally a range of novel statistical inferences using this new theory is discussed. Secondly, NPI for reproducibility is developed by considering two statistical tests based on order statistics. Thirdly, robustness of NPI is introduced which involves a first adaptation of some of the robustness theory concepts within the NPI setting, namely sensitivity curve and breakdown point. In this thesis, we present NPI for future order statistics. Given data consisting of n real-valued observations, mm future observations are considered and predictive probabilities are presented for the rr-th ordered future observation. In addition, joint and conditional probabilities for events involving multiple future order statistics are presented. We further present the use of such predictive probabilities for order statistics in statistical inference, in particular considering pairwise and multiple comparisons based on future order statistics of two or more independent groups of data. This new theory enables us to develop NPI for the reproducibility of statistical hypothesis tests based on order statistics. Reproducibility of statistical hypothesis tests is an important issue in applied statistics: if the test were repeated, would the same conclusion be reached that is rejection or non-rejection of the null hypothesis? NPI provides a natural framework for such inferences, as its explicitly predictive nature fits well with the core problem formulation of a repeat of the test in the future. For inference on reproducibility of statistical tests, NPI provides lower and upper reproducibility probabilities (RP). The NPI-RP method is presented for two basic tests using order statistics, namely a test for a specific value for a population quantile and a precedence test for comparison of data from two populations, as typically used for experiments involving lifetime data if one wishes to conclude before all observations are available. As every statistical inference has underlying assumptions about models and specific methods used, one important field in statistics is the study of robustness of inferences. The concept of robust inference is usually aimed at development of inference methods which are not too sensitive to data contamination or to deviations from model assumptions. In this thesis we use it in a slightly narrower sense. For our aims, robustness indicates insensitivity to small changes in the data, as our predictive probabilities for order statistics and statistical inferences involving future observations depend upon the given observations. We introduce some concepts for assessing the robustness of statistical procedures in the NPI framework, namely sensitivity curve and breakdown point. The classical breakdown point does not apply to our context as the predictive inferences are bounded, so we change the definition to suit our context. Most of our nonparametric inferences have a reasonably good robustness with regard to small changes in the data. Traditionally, in the robustness literature there has been quite a lot of emphasis and discussion on robustness properties of estimators for the location parameters. Thus, in our investigation of NPI robustness we also focus on differences in robustness of the mean and the median of the mm future observations, and see how they relate to the classical concepts of robustness of the median and mean

    Reproducibility of Statistical Inference Based on Randomised Response Data

    Get PDF
    Reproducibility of an experiment’s conclusion is an important topic in a variety of fields, including social studies. This thesis presents a theory of reproducibility of statistical inference based on randomised response data. First, reproducibility of statistical hypothesis tests based on randomised response data is studied. This thesis presents statistical inference for reproducibility of the outcome of a hypothesis test based on data resulting from different randomised response techniques (RRT). Secondly, a new method for quantifying reproducibility of statistical estimates is introduced. Finally, this method is applied to derive reproducibility of estimates of population characteristics based on randomised response data. The quantification of reproducibility uses nonparametric predictive inference (NPI), which is suitable for reproducibility when considering this as a prediction problem. NPI uses only few model assumptions and results in lower and upper reproducibility probabilities. We compared different randomised response methods. The results of this thesis open up the possibility of pre-selecting a randomised response method with higher reproducibility and also indicate the relationship between variance and reproducibility with the same privacy level. We find that less variability in the reported responses of RRT methods leads to higher reproducibility of statistical hypothesis tests based on RRT data with the same privacy degree. Therefore, for RRT methods using binary responses, reproducibility of hypothesis tests based on the forced method is greater than reproducibility of hypothesis tests based on the Greenberg method. For RRT methods using real-valued responses, reproducibility of estimates is greater for data collected from the Greenberg method than the reproducibility of estimates for data collected from the optional multiplicative method and the Eichhorn and Hayre method

    Nonparametric Predictive Methods for Bootstrap and Test Reproducibility

    Get PDF
    This thesis investigates a new bootstrap method, this method is called Nonparametric Predictive Inference Bootstrap (NPI-B). Nonparametric predictive inference (NPI) is a frequentist statistics approach that makes few assumptions, enabled by using lower and upper probabilities to quantify uncertainty, and explicitly focuses on future observations. In the NPI-B method, we use a sample of n observations to create n + 1 intervals and draw one future value uniformly from one interval. Then this value is added to the data and the process is repeated, now with n+1 observations. Repetition of this process leads to the NPI-B sample, which therefore is not taken from the actual sample, but consists of values in the whole range of possible observations, also going beyond the range of the actual sample. We explore NPI-B for data on finite intervals, real line and non negative observations, and compare it to other bootstrap methods via simulation studies which show that the NPI-B method works well as a prediction method. The NPI method is presented for the reproducibility probability (RP) of some nonparametric tests. Recently, there has been substantial interest in the reproducibility probability, where not only its estimation but also its actual definition and interpretation are not uniquely determined in the classical frequentist statistics framework. The explicitly predictive nature of NPI provides a natural formulation of inferences on RP. It is used to derive lower and upper bounds of RP values (known as the NPI-RP method) but if we consider large sample sizes, the computation of these bounds is difficult. We explore the NPI-B method to predict the RP values (they are called NPI-B-RP values) of some nonparametric tests. Reproducibility of tests is an important characteristic of the practical relevance of test outcomes

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Statistical and Computational Aspects of Learning with Complex Structure

    Get PDF
    The recent explosion of data that is routinely collected has led scientists to contemplate more and more sophisticated structural assumptions. Understanding how to harness and exploit such structure is key to improving the prediction accuracy of various statistical procedures. The ultimate goal of this line of research is to develop a set of tools that leverage underlying complex structures to pool information across observations and ultimately improve statistical accuracy as well as computational efficiency of the deployed methods. The workshop focused on recent developments in regression and matrix estimation under various complex constraints such as physical, computational, privacy, sparsity or robustness. Optimal-transport based techniques for geometric data analysis were also a main topic of the workshop

    Bayesian inference with optimal maps

    Get PDF
    We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of an optimization problem, exploiting gradient information from the forward model when possible. The resulting algorithm overcomes many of the computational bottlenecks associated with Markov chain Monte Carlo. Advantages of a map-based representation of the posterior include analytical expressions for posterior moments and the ability to generate arbitrary numbers of independent posterior samples without additional likelihood evaluations or forward solves. The optimization approach also provides clear convergence criteria for posterior approximation and facilitates model selection through automatic evaluation of the marginal likelihood. We demonstrate the accuracy and efficiency of the approach on nonlinear inverse problems of varying dimension, involving the inference of parameters appearing in ordinary and partial differential equations.United States. Dept. of Energy. Office of Advanced Scientific Computing Research (Grant DE-SC0002517)United States. Dept. of Energy. Office of Advanced Scientific Computing Research (Grant DE-SC0003908

    Potential Alzheimer\u27s Disease Plasma Biomarkers

    Get PDF
    In this series of studies, we examined the potential of a variety of blood-based plasma biomarkers for the identification of Alzheimer\u27s disease (AD) progression and cognitive decline. With the end goal of studying these biomarkers via mixture modeling, we began with a literature review of the methodology. An examination of the biomarkers with demographics and other health factors found evidence of minimal risk of confounding along the causal pathway from biomarkers to cognitive performance. Further study examined the usefulness of linear combinations of biomarkers, achieved via partial least squares (PLS) analysis, as predictors of various cognitive assessment scores and clinical cognitive diagnosis. The identified biomarker linear combinations were not effective at predicting cognitive outcomes. The final study of our biomarkers utilized mixture modeling through the extension of group-based trajectory modeling (GBTM). We modeled five biomarkers, covering a range of functions within the body, to identify distinct trajectories over time. Final models showed statistically significant differences in baseline risk factors and cognitive assessments between developmental trajectories of the biomarker outcomes. This course of study has added valuable information to the field of plasma biomarker research in relation to Alzheimer’s disease and cognitive decline

    Mixture Modeling and Outlier Detection in Microarray Data Analysis

    Get PDF
    Microarray technology has become a dynamic tool in gene expression analysis because it allows for the simultaneous measurement of thousands of gene expressions. Uniqueness in experimental units and microarray data platforms, coupled with how gene expressions are obtained, make the field open for interesting research questions. In this dissertation, we present our investigations of two independent studies related to microarray data analysis. First, we study a recent platform in biology and bioinformatics that compares the quality of genetic information from exfoliated colonocytes in fecal matter with genetic material from mucosa cells within the colon. Using the intraclass correlation coe�cient (ICC) as a measure of reproducibility, we assess the reliability of density estimation obtained from preliminary analysis of fecal and mucosa data sets. Numerical findings clearly show that the distribution is comprised of two components. For measurements between 0 and 1, it is natural to assume that the data points are from a beta-mixture distribution. We explore whether ICC values should be modeled with a beta mixture or transformed first and fit with a normal mixture. We find that the use of mixture of normals in the inverse-probit transformed scale is less sensitive toward model mis-specification; otherwise a biased conclusion could be reached. By using the normal mixture approach to compare the ICC distributions of fecal and mucosa samples, we observe the quality of reproducible genes in fecal array data to be comparable with that in mucosa arrays. For microarray data, within-gene variance estimation is often challenging due to the high frequency of low replication studies. Several methodologies have been developed to strengthen variance terms by borrowing information across genes. However, even with such accommodations, variance may be initiated by the presence of outliers. For our second study, we propose a robust modification of optimal shrinkage variance estimation to improve outlier detection. In order to increase power, we suggest grouping standardized data so that information shared across genes is similar in distribution. Simulation studies and analysis of real colon cancer microarray data reveal that our methodology provides a technique which is insensitive to outliers, free of distributional assumptions, effective for small sample size, and data adaptive

    Vol. 13, No. 2 (Full Issue)

    Get PDF
    • …
    corecore