875 research outputs found

    Bayesian Confidence Intervals for Coefficients of Variation of PM10 Dispersion

    Get PDF
    Herein, we propose the Bayesian approach for constructing the confidence intervals for both the coefficient of variation of a log-normal distribution and the difference between the coefficients of variation of two log-normal distributions. For the first case, the Bayesian approach was compared with large-sample, Chi-squared, and approximate fiducial approaches via Monte Carlo simulation. For the second case, the Bayesian approach was compared with the method of variance estimates recovery (MOVER), modified MOVER, and approximate fiducial approaches using Monte Carlo simulation. The results show that the Bayesian approach provided the best approach for constructing the confidence intervals for both the coefficient of variation of a log-normal distribution and the difference between the coefficients of variation of two log-normal distributions. To illustrate the performances of the confidence limit construction approaches with real data, they were applied to analyze real PM10 datasets from the Nan and Chiang Mai provinces in Thailand, the results of which are in agreement with the simulation results. Doi: 10.28991/esj-2021-01264 Full Text: PD

    Estimating Simultaneous Confidence Intervals for Multiple Contrasts of Means of Normal Distribution with Known Coefficients of Variation

    Get PDF
    This study investigated the performance of simultaneous confidence intervals (SCIs) to differentiate the means of multiple normal population distributions with known coefficients of variation (CVs). The researchers aim to find the means of several normal distributions with known coefficients of variation, SCIMOVER, SCIs, and SCIk, which are extended to k populations. The authors constructed SCIs for the difference between multiple normal means with known coefficients of variation. There are three approaches: the method of variance estimates recovery approach (MOVER), and two central limit theorem approaches (CLT). A Monte Carlo simulation was used to evaluate the performance of the coverage probabilities and expected lengths of the methods. The simulation results indicate that the MOVER approach is more desirable than the CLT approaches in terms of the coverage probability. The performance of the proposed approaches is also compared using an example with real data. Moreover, the coverage probability results for SCIMOVER were over the nominal level of 0.95, indicating that it is more stable than SCIs and SCIkand was thus more appropriate for use in this scenario. Finally, the researchers suggest using the MOVER approach for constructing the SCIs to determine the variation to achieve the best solution in related fields in the near future. Doi: 10.28991/ESJ-2022-06-04-04 Full Text: PD

    Simultaneous confidence intervals for all pairwise differences between the coefficients of variation of rainfall series in Thailand

    Get PDF
    The delta-lognormal distribution is a combination of binomial and lognormal distributions, and so rainfall series that include zero and positive values conform to this distribution. The coefficient of variation is a good tool for measuring the dispersion of rainfall. Statistical estimation can be used not only to illustrate the dispersion of rainfall but also to describe the differences between rainfall dispersions from several areas simultaneously. Therefore, the purpose of this study is to construct simultaneous confidence intervals for all pairwise differences between the coefficients of variation of delta-lognormal distributions using three methods: fiducial generalized confidence interval, Bayesian, and the method of variance estimates recovery. Their performances were gauged by measuring their coverage probabilities together with their expected lengths via Monte Carlo simulation. The results indicate that the Bayesian credible interval using the Jeffreys’ rule prior outperformed the others in virtually all cases. Rainfall series from five regions in Thailand were used to demonstrate the efficacies of the proposed methods

    Statistical integration of information

    Get PDF
    Modern data analysis frequently involves multiple large and diverse data sets generated from current high-throughput technologies. An integrative analysis of these sources of information is very promising for improving knowledge discovery in various fields. This dissertation focuses on three distinct challenges in the integration of information. The variables obtained from diverse and novel platforms often have highly non-Gaussian marginal distributions and therefore are challenging to analyze by commonly used methods. The first part introduces an automatic transformation for improving data quality before integrating multiple data sources. For each variable, a new family of parametrizations of the shifted logarithm transformation is proposed, which allows transformation for both left and right skewness within the single family and an automatic selection of the parameter value. The second part discusses an integrative analysis of disparate data blocks measured on a common set of experimental subjects. This data integration naturally motivates the simultaneous exploration of the joint and individual variation within each data block resulting in new insights. We introduce Non-iterative Joint and Individual Variation Explained (Non-iterative JIVE), capturing both joint and individual variation within each data block. This is a major improvement over earlier approaches to this challenge in terms of both a new conceptual understanding and a fast linear algebra computation. An important mathematical contribution is the use of score subspaces as the principal descriptors of variation structure and the use of perturbation theory as the guide for variation segmentation. Furthermore, this makes our method robust against the heterogeneity among data blocks, without a need for normalization. The last part proposes a Generalized Fiducial Inference inspired method for finding a robust consensus among several independently derived confidence distributions (CDs) for a quantity of interest. The resulting fused CD is robust to the existence of potentially discrepant CDs in the collection. The method uses computationally efficient fiducial model averaging to obtain a robust consensus distribution without the need to eliminate discrepant CDs from the analysis.Doctor of Philosoph

    Coherent frequentism

    Full text link
    By representing the range of fair betting odds according to a pair of confidence set estimators, dual probability measures on parameter space called frequentist posteriors secure the coherence of subjective inference without any prior distribution. The closure of the set of expected losses corresponding to the dual frequentist posteriors constrains decisions without arbitrarily forcing optimization under all circumstances. This decision theory reduces to those that maximize expected utility when the pair of frequentist posteriors is induced by an exact or approximate confidence set estimator or when an automatic reduction rule is applied to the pair. In such cases, the resulting frequentist posterior is coherent in the sense that, as a probability distribution of the parameter of interest, it satisfies the axioms of the decision-theoretic and logic-theoretic systems typically cited in support of the Bayesian posterior. Unlike the p-value, the confidence level of an interval hypothesis derived from such a measure is suitable as an estimator of the indicator of hypothesis truth since it converges in sample-space probability to 1 if the hypothesis is true or to 0 otherwise under general conditions.Comment: The confidence-measure theory of inference and decision is explicitly extended to vector parameters of interest. The derivation of upper and lower confidence levels from valid and nonconservative set estimators is formalize

    Maximum Fidelity

    Full text link
    The most fundamental problem in statistics is the inference of an unknown probability distribution from a finite number of samples. For a specific observed data set, answers to the following questions would be desirable: (1) Estimation: Which candidate distribution provides the best fit to the observed data?, (2) Goodness-of-fit: How concordant is this distribution with the observed data?, and (3) Uncertainty: How concordant are other candidate distributions with the observed data? A simple unified approach for univariate data that addresses these traditionally distinct statistical notions is presented called "maximum fidelity". Maximum fidelity is a strict frequentist approach that is fundamentally based on model concordance with the observed data. The fidelity statistic is a general information measure based on the coordinate-independent cumulative distribution and critical yet previously neglected symmetry considerations. An approximation for the null distribution of the fidelity allows its direct conversion to absolute model concordance (p value). Fidelity maximization allows identification of the most concordant model distribution, generating a method for parameter estimation, with neighboring, less concordant distributions providing the "uncertainty" in this estimate. Maximum fidelity provides an optimal approach for parameter estimation (superior to maximum likelihood) and a generally optimal approach for goodness-of-fit assessment of arbitrary models applied to univariate data. Extensions to binary data, binned data, multidimensional data, and classical parametric and nonparametric statistical tests are described. Maximum fidelity provides a philosophically consistent, robust, and seemingly optimal foundation for statistical inference. All findings are presented in an elementary way to be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte

    Statistical Inference on Optimal Points to Evaluate Multi-State Classification Systems

    Get PDF
    In decision making, an optimal point represents the settings for which a classification system should be operated to achieve maximum performance. Clearly, these optimal points are of great importance in classification theory. Not only is the selection of the optimal point of interest, but quantifying the uncertainty in the optimal point and its performance is also important. The Youden index is a metric currently employed for selection and performance quantification of optimal points for classification system families. The Youden index quantifies the correct classification rates of a classification system, and its confidence interval quantifies the uncertainty in this measurement. This metric currently focuses on two or three classes, and only allows for the utility of correct classifications and the cost of total misclassifications to be considered. An alternative to this metric for three or more classes is a cost function which considers the sum of incorrect classification rates. This new metric is preferable as it can include class prevalences and costs associated with every classification. In multi-class settings this informs better decisions and inferences on optimal points. The work in this dissertation develops theory and methods for confidence intervals on a metric based on misclassfication rates, Bayes Cost, and where possible, the thresholds found for an optimal point using Bayes Cost. Hypothesis tests for Bayes Cost are also developed to test a classification systems performance or compare systems with an emphasis on classification systems involving three or more classes. Performance of the newly proposed methods is demonstrated with simulation

    Vol. 2, No. 2 (Full Issue)

    Get PDF

    Bayesian Viral Substitution Analysis and Covariance Estimation via Generalized Fiducial Inference

    Get PDF
    With the advances in biology and computing technologies, there have been increasing amount of big bio data awaiting to be analyzed. Aiming to develop statistical tools for omics data, we focus on the problem of viral sequencing data modeling as well a fundamental statistics question with applications in both biology and many other fields. This dissertation is comprised of three major parts. Motivated by a multi-time sampled, case-control influenza viral population study, in the first part we model the sequencing data of a viral population under a Bayesian Dirichlet mixture distribution. We have developed an efficient clustering scheme that enables us to distinguish treatment causal changes from variation within viral populations. As a proof of concept, we applied our method to a well-studied HIV dataset, and successfully identified known drug resistant regions and additional potential sites. For the influenza data, our algorithm revealed two genome sites with strong evidence of treatment effect. The second part of the thesis concerns the covariance matrix estimation in a high-dimensional multivariate linear models and sparse covariate settings using fiducial inference. The sparsity imposed on the covariate matrix allows to estimate relationships between a list of gene expressions and several metabolic levels under a high dimension low sample size setting. Aiming to quantify the uncertainty of the estimators without having to choose a prior, we have developed a fiducial approach to the estimation of covariance matrix. Built upon the Fiducial Berstein-von Mises Theorem, we show that the fiducial distribution of the covariance matrix is consistent under our framework. Furthermore, we propose an adaptive efficient reversible jump Markov chain Monte Carlo algorithm for sampling from the fiducial distribution, which enables us to define a meaningful confidence region for the covariance matrix. In the last part of the thesis, we examine the stochastic models for capturing the evolutionary processes of gene expression levels. Generalizing a microarray Brownian motion (BM) model, we have developed a BM model for high-throughput sequencing data that takes sampling variance into account. To allow conservation in the evolution process, we also investigate Ornstein-Uhlenbeck (OU) models. Applying to a multiple-tissue mammalian dataset, we showed that the OU model is more appropriate for the top 10 highly expressed genes in the dataset, and we performed hypothesis testing for significant changes in gene expression levels along specific lineages.Doctor of Philosoph
    corecore