493 research outputs found

    Fast Power Curve Approximation for Posterior Analyses

    Full text link
    Bayesian hypothesis testing leverages posterior probabilities, Bayes factors, or credible intervals to assess characteristics that summarize data. We propose a framework for power curve approximation with such hypothesis tests that assumes data are generated using statistical models with fixed parameters for the purposes of sample size determination. We present a fast approach to explore the sampling distribution of posterior probabilities when the conditions for the Bernstein-von Mises theorem are satisfied. We extend that approach to facilitate targeted sampling from the approximate sampling distribution of posterior probabilities for each sample size explored. These sampling distributions are used to construct power curves for various types of posterior analyses. Our resulting method for power curve approximation is orders of magnitude faster than conventional power curve estimation for Bayesian hypothesis tests. We also prove the consistency of the corresponding power estimates and sample size recommendations under certain conditions.Comment: arXiv admin note: text overlap with arXiv:2306.0947

    Fast Sample Size Determination for Bayesian Equivalence Tests

    Full text link
    Equivalence testing allows one to conclude that two characteristics are practically equivalent. We propose a framework for fast sample size determination with Bayesian equivalence tests facilitated via posterior probabilities. We assume that data are generated using statistical models with fixed parameters for the purposes of sample size determination. Our framework defines a distribution for the sample size that controls the length of posterior highest density intervals, where targets for the interval length are calibrated to yield desired power for the equivalence test. We prove the normality of the limiting distribution for the sample size and introduce a two-stage approach for estimating this distribution in the nonlimiting case. This approach is much faster than traditional power calculations for Bayesian equivalence tests, and it requires users to make fewer choices than traditional simulation-based methods for Bayesian sample size determination

    Discussion of Bridging the Gap between Theory and Practice in Basic Statistical Process Monitoring

    Get PDF

    Quantifying Similarity in Reliability Surfaces Using the Probability of Agreement

    Get PDF
    When separate populations exhibit similar reliability as a function of multiple explanatory variables, combining them into a single population is tempting. This can simplify future predictions and reduce uncertainty associated with estimation. However, combining these populations may introduce bias if the underlying relationships are in fact different. The probability of agreement formally and intuitively quantifies the similarity of estimated reliability surfaces across a two-factor input space. An example from the reliability literature demonstrates the utility of the approach when deciding whether to combine two populations or to keep them as distinct. New graphical summaries provide strategies for visualizing the results

    Assessment and Comparison of Continuous Measurement Systems

    Get PDF
    In this thesis we critically examine the assessment and comparison of continuous measurement systems. Measurement systems, defined to be the devices, people, and protocol used to make a measurement, are an important tool in a variety of contexts. In manufacturing contexts a measurement system may be used to monitor a manufacturing process; in healthcare contexts a measurement system may be used to evaluate the status of a patient. In all contexts it is desirable for the measurement system to be accurate and precise, so as to provide high-quality and reliable measurements. A measurement system assessment (MSA) study is performed to assess the adequacy, and in particular the variability (precision), of the measurement system. The Automotive Industry Action Group (AIAG) recommends a standard design for such a study in which 10 subjects are measured multiple times by each individual who operates the measurement system. In this thesis we propose alternate study designs which, with little extra effort, provide more precise evaluations of the measurement system’s performance. Specifically, we propose the use of unbalanced augmented plans which, by strategically using more subjects and fewer replicate measurements, are substantially more efficient and more informative than the AIAG recommendation. We consider cases when the measurement system is operated by just one individual (or is automated), and when the measurement system is operated by multiple individuals, and in all cases, augmented plans are superior to the typical designs recommended by the AIAG. In situations where the measurement system is used routinely, and records of these single measurements on many subjects are kept, we propose incorporating this additional ‘baseline’ information into the planning and analysis of an MSA study. Once again we consider the scenarios in which the measurement system is operated by a single individual, or multiple individuals. In all cases incorporating baseline information in the planning and analysis of an MSA study substantially increases the amount of information about subject-to-subject variation. This in turn allows for a much more precise assessment of the measurement system than is possible with the designs recommended by the AIAG. Often new measurement systems that are less expensive, require less man-power, and are perhaps less time-consuming, are developed. In these cases, potential customers may wish to compare the new measurement system with their existing one, to ensure that the measurements by the new system agree suitably with the old. This comparison is typically done with a measurement system comparison (MSC) study, in which a number of randomly selected subjects are measured one or more times by each system. A variety of statistical techniques exist for analyzing MSC study data and quantifying the agreement between the two systems, but none are without challenges. We propose the probability of agreement, a new method for analyzing MSC data, which more effectively and transparently quantifies the agreement between two measurement systems. The chief advantage of the probability of agreement is that it is intuitive and simple to interpret, and its interpretation is the same no matter how complicated the setting. We illustrate its applicability, and its superiority to existing techniques, in a variety of settings and we also make recommendations for a study design that facilitates precise estimation of this probability

    Assessing Agreement between Two Measurement Systems: An Alternative to the Limits of Agreement Approach

    Get PDF
    The comparison of two measurement systems is important in medical and other contexts. A common goal is to decide if a new measurement system agrees suitably with an existing one, and hence whether the two can be used interchangeably. Various methods for assessing interchangeability are available, the most popular being the limits of agreement approach due to Bland and Altman. In this article, we review the challenges of this technique and propose a model-based framework for comparing measurement systems that overcomes those challenges. The proposal is based on a simple metric, the probability of agreement, and a corresponding plot which can be used to summarize the agreement between two measurement systems. We also make recommendations for a study design that facilitates accurate and precise estimation of the probability of agreement

    Augmented Measurement System Assessment

    Get PDF
    The standard plan for the assessment of the variation due to a measurement system involves a number of operators repeatedly measuring a number of parts in a balanced design. In this article, we consider the performance of two types of (unbalanced) assessment plans. In each type, we use a standard plan augmented with a second component. In type A augmentation, each operator measures a different set of parts once each. In type B augmentation, each operator measures the same set of parts once each. The goal of the paper is to identify good augmented plans for estimating the gauge repeatability and reproducibility (GR&R), a ratio that compares the contribution of the measurement system to the overall process variation. We show that, if there are three or more operators or if we include the possibility of part-by-operator interaction, then use of an appropriate augmented plan can produce substantial gains in efficiency for estimating GR&R compared with the best standard plan with the same total number of measurements

    Monitoring Radiation Use in Cardiac Fluoroscopy Imaging Procedures

    Get PDF
    Objective: Timely identification of systematic changes in radiation delivery of an imaging system can lead to a reduction in risk for the patients involved. However, existing quality assurance programs involving the routine testing of equipment performance using phantoms are limited in their ability to effectively carry out this task. To address this issue we propose the implementation of an ongoing monitoring process that utilizes procedural data to identify unexpected large or small radiation exposures for individual patients, as well as to detect persistent changes in the radiation output of imaging platforms. Methods: Data used in this study were obtained from records routinely collected during procedures performed in the cardiac catheterization imaging facility at St Andrew\u27s War Memorial Hospital, Brisbane, Australia over the period January 2008 to March 2010. A two stage monitoring process employing individual and exponentially weighted moving average (EWMA) control charts was developed and used to identify unexpectedly high or low radiation exposure levels for individual patients, as well as detect persistent changes in the radiation output delivered by the imaging systems. To increase sensitivity of the charts we account for variation in dose area product (DAP) values due to other measured factors (patient weight, fluoroscopy time, digital acquisition frame count) using multiple linear regression. Control charts are then constructed using the residual values from this linear regression. The proposed monitoring process was evaluated using simulation to model performance of the process under known conditions. Results: Retrospective application of this technique to actual clinical data identified a number of cases in which the DAP result could be considered unexpected. Most of these, upon review, were attributed to data entry errors. The charts monitoring overall system radiation output trends demonstrated changes in equipment performance associated with relocation of the equipment to a new department. When tested under simulated conditions, the EWMA chart was capable of detecting a sustained 15% increase in average radiation output within 60 cases (\u3c 1 month of operation) while a 33% increase would be signalled within 20 cases. Conclusion: This technique offers a valuable enhancement to existing quality assurance programs in radiology that rely upon the testing of equipment radiation output at discrete time frames to ensure performance security
    • …
    corecore