6,289 research outputs found
An Alternative Choice for the Critical Value of Limits of Agreement and Simulation-Based Sample Size Calculation in Bland Altman Analysis
Bland Altman analysis is a statistical method for assessing the degree of agreement between two methods of measurement. In medical and health sciences, it is a popular method because of its simple calculation and visualization. Under normality assumption, the calculation is based on two sufficient statistics and s, where is the sample mean of differences and s is the sample standard deviation of the differences. The interval is referred to as 95% limits of agreement (LOA) in literature. In a seminar paper, Bland and Altman [2] interpreted LOA as āIf the differences are normally distributed, 95% of differences will lie between these limitsā. This interpretation seems to be widely accepted, but there is a caveat because the coverage probability of LOA is a random variable. In this article, we demonstrate the sampling distribution of its coverage probability by simulation, and we discuss an alternative choice for the critical value. In addition, using simulation, we perform sample size calculation which satisfies a specified condition for the sampling distribution of coverage probability
Bayesian Model Averaging and Compromising in Dose-Response Studies
Dose-response models are applied to animal-based cancer risk assessments and human-based clinical trials usually with small samples. For sparse data, we rely on a parametric model for efficiency, but posterior inference can be sensitive to an assumed model. In addition, when we utilize prior information, multiple experts may have different prior knowledge about the parameter of interest. When we make sequential decisions to allocate experimental units in an experiment, an outcome may depend on decision rules, and each decision rule has its own perspective. In this chapter, we address the three practical issues in small-sample dose-response studies: (i) model-sensitivity, (ii) disagreement in prior knowledge and (iii) conflicting perspective in decision rules
Incorporating Statistical Strategy into Image Analysis to Estimate Effects of Steam and Allyl Isocyanate on Weed Control
Weeds are the major limitation to efficient crop production, and effective weed management is necessary to prevent yield losses due to crop-weed competition. Assessments of the relative efficacy of weed control treatments by traditional counting methods is labor intensive and expensive. More efficient methods are needed for weed control assessments. There is extensive literature on advanced techniques of image analysis for weed recognition, identification, classification, and leaf area, but there is limited information on statistical methods for hypothesis testing when data are obtained by image analysis (RGB decimal code). A traditional multiple comparison test, such as the Dunnett-Tukey-Kramer (DTK) test, is not an optimal statistical strategy for the image analysis because it does not fully utilize information contained in RGB decimal code. In this article, a bootstrap method and a Poisson model are considered to incorporate RGB decimal codes and pixels for comparing multiple treatments on weed control. These statistical methods can also estimate interpretable parameters such as the relative proportion of weed coverage and weed densities. The simulation studies showed that the bootstrap method and the Poisson model are more powerful than the DTK test for a fixed significance level. Using these statistical methods, three soil disinfestation treatments, steam, allyl-isothiocyanate (AITC), and control, were compared. Steam was found to be significantly more effective than AITC, a difference which could not be detected by the DTK test. Our study demonstrates that an appropriate statistical method can leverage statistical power even with a simple RGB index
Model Averaging with AIC Weights for Hypothesis Testing of Hormesis at Low Doses
For many doseāresponse studies, large samples are not available. Particularly, when the outcome of interest is binary rather than continuous, a large sample size is required to provide evidence for hormesis at low doses. In a small or moderate sample, we can gain statistical power by the use of a parametric model. It is an efficient approach when it is correctly specified, but it can be misleading otherwise. This research is motivated by the fact that data points at high experimental doses have too much contribution in the hypothesis testing when a parametric model is misspecified. In doseāresponse analyses, to account for model uncertainty and to reduce the impact of model misspecification, averaging multiple models have been widely discussed in the literature. In this article, we propose to average semiparametric models when we test for hormesis at low doses. We show the different characteristics of averaging parametric models and averaging semiparametric models by simulation. We apply the proposed method to real data, and we show that P values from averaged semiparametric models are more credible than P values from averaged parametric methods. When the true doseāresponse relationship does not follow a parametric assumption, the proposed method can be an alternative robust approach
Using Simple Alternative Hypothesis to Increase Statistical Power in Sparse Categorical Data
There are numerous statistical hypothesis tests for categorical data including Pearson\u27s Chi-Square goodness-of-fit test and other discrete versions of goodness-of-fit tests. For these hypothesis tests, the null hypothesis is simple, and the alternative hypothesis is composite which negates the simple null hypothesis. For power calculation, a researcher specifies a significance level, a sample size, a simple null hypothesis, and a simple alternative hypothesis. In practice, there are cases when an experienced researcher has deep and broad scientific knowledge, but the researcher may suffer from a lack of statistical power due to a small sample size being available. In such a case, we may formulate hypothesis testing based on a simple alternative hypothesis instead of the composite alternative hypothesis. In this article, we investigate how much statistical power can be gained via a correctly specified simple alternative hypothesis and how much statistical power can be lost under a misspecified alternative hypothesis, particularly when an available sample size is small
An Image Segmentation Technique with Statistical Strategies for Pesticide Efficacy Assessment
Image analysis is a useful technique to evaluate the efficacy of a treatment for weed control. In this study, we address two practical challenges in the image analysis. First, it is challenging to accurately quantify the efficacy of a treatment when an entire experimental unit is not affected by the treatment. Second, RGB codes, which can be used to identify weed growth in the image analysis, may not be stable due to various surrounding factors, human errors, and unknown reasons. To address the former challenge, the technique of image segmentation is considered. To address the latter challenge, the proportion of weed area is adjusted under a beta regression model. The beta regression is a useful statistical method when the outcome variable (proportion) ranges between zero and one. In this study, we attempt to accurately evaluate the efficacy of a 35% hydrogen peroxide (HP). The image segmentation was applied to separate two zones, where the HP was directly applied (gray zone) and its surroundings (nongray zone). The weed growth was monitored for five days after the treatment, and the beta regression was implemented to compare the weed growth between the gray zone and the control group and between the nongray zone and the control group. The estimated treatment effect was substantially different after the implementation of image segmentation and the adjustment of green area
A Paradox in Bland-Altman Analysis and a Bernoulli Approach
A reliable method of measurement is important in various scientific areas. When a new method of measurement is developed, it should be tested against a standard method that is currently in use. Bland and Altman proposed limits of agreement (LOA) to compare two methods of measurement under the normality assumption. Recently, a sample size formula has been proposed for hypothesis testing to compare two methods of measurement. In the hypothesis testing, the null hypothesis states that the two methods do not satisfy a pre-specified acceptable degree of agreement. Carefully considering the interpretation of the LOA, we argue that there are cases of an acceptable degree of agreement inside the null parameter space. We refer to this subset as the paradoxical parameter space in this article. To address this paradox, we apply a Bernoulli approach to modify the null parameter space and to relax the normality assumption on the data. Using simulations, we demonstrate that the change in statistical power is not negligible when the true parameter values are inside or near the paradoxical parameter space. In addition, we demonstrate an application of the sequential probability ratio test to allow researchers to draw a conclusion with a smaller sample size and to reduce the study time
An Alternative Perspective on Consensus Priors with Applications to Phase I Clinical Trials
We occasionally need to make a decision or a series of decisions based on a small sample. In some cases, an investigator is knowledgeable about a parameter of interest in some degrees or is accessible to various sources of prior information. Yet, two or more experts cannot have an identical prior distribution for the parameter. In this manuscript, we discuss the use of a consensus prior and compare two classes of Bayes estimators. In the first class of Bayes estimators, the contribution of each prior opinion is determined by observing data. In the second class, the contribution of each prior opinion is determined after observing data. Bayesian designs for Phase I clinical trials allocate trial participants at new experimental doses based on accumulated information, while the typical sample sizes are fairly small. Using simulations, we illustrate the usefulness of a combined estimate in the early phase clinical trials
Sequential Testing in Reliability and Validity Studies With Repeated Measurements per Subject
In medical, health, and sports sciences, researchers desire a device with high reliability and validity. This article focuses on reliability and validity studies with n subjects and m ā„ 2 repeated measurements per subject. High statistical power can be achieved by increasing n or m, and increasing m is often easier than increasing n in practice unless m is too high to result in systematic bias. The sequential probability ratio test (SPRT) is a useful statistical method which can conclude a null hypothesis H0 or an alternative hypothesis H1 with 50% of the required sample size of a non-sequential test on average. The traditional SPRT requires the likelihood function for each observed random variable, and it can be a practical burden for evaluating the likelihood ratio after each observation of a subject. Instead, m observed random variables per subject can be transformed into a test statistic which has a known sampling distribution under H0 and under H1. This allows us to formulate a SPRT based on a sequence of test statistics. In this article, three types of study are considered: reliability of a device, reliability of a device relative to a criterion device, and validity of a device relative to a criterion device. Using SPRT for testing the reliability of a device, for small m, results in an average sample size of about 50% of the fixed sample size for a non-sequential test. For comparing a device to criterion, the average sample size approaches to 60% approximately as m increases. The SPRT tolerates violation of normality assumption for validity study, but it does not for reliability study
Applications of Statistical Experimental Designs to Improve Statistical Inference in Weed Management
In a balanced design, researchers allocate the same number of units across all treatment groups. It has been believed as a rule of thumb among some researchers in agriculture. Sometimes, an unbalanced design outperforms a balanced design. Given a specific parameter of interest, researchers can design an experiment by unevenly distributing experimental units to increase statistical information about the parameter of interest. An additional way of improving an experiment is an adaptive design (e.g., spending the total sample size in multiple steps). It is helpful to have some knowledge about the parameter of interest to design an experiment. In the initial phase of an experiment, a researcher may spend a portion of the total sample size to learn about the parameter of interest. In the later phase, the remaining portion of the sample size can be distributed in order to gain more information about the parameter of interest. Though such ideas have existed in statistical literature, they have not been applied broadly in agricultural studies. In this article, we used simulations to demonstrate the superiority of the experimental designs over the balanced designs under three practical situations: comparing two groups, studying a dose-response relationship with right-censored data, and studying a synergetic effect of two treatments. The simulations showed that an objective-specific design provides smaller error in parameter estimation and higher statistical power in hypothesis testing when compared to a balanced design. We also conducted an adaptive experimental design applied to a dose-response study with right-censored data to quantify the effect of ethanol on weed control. Retrospective simulations supported the benefit of this adaptive design as well. All researchers face different practical situations, and appropriate experimental designs will help utilize available resources efficiently
- ā¦