62 research outputs found
Editor\u27s Preface and Table of Contents
These proceedings contain papers presented in the twentieth annual Kansas State University Conference on Applied Statistics in Agriculture, held in Manhattan, Kansas, April 27- April 29, 2008
Editor\u27s Preface and Table of Contents
These proceedings contain papers presented in the twenty-first annual Kansas State University Conference on Applied Statistics in Agriculture, held in Manhattan, Kansas, April 19 - April 21, 2009
TREATMENT HETEROGENEITY AND POTENTIAL OUTCOMES IN LINEAR MIXED EFFECTS MODELS
Studies commonly focus on estimating a mean treatment effect in a population. However, in some applications the variability of treatment effects across individual units may help to characterize the overall effect of a treatment across the population. Consider a set of treatments, {T,C}, where T denotes some treatment that might be applied to an experimental unit and C denotes a control. For each of N experimental units, the duplet {rTᵢ,rCᵢ}, i = 1,2, … , N, represents the potential response of the i th experimental unit if treatment were applied and the response of the experimental unit if control were applied, respectively. The causal effect of T compared to C is the difference between the two potential responses, rTᵢ - rCᵢ. Much work has been done to elucidate the statistical properties of a causal effect, given a set of particular assumptions. Gadbury and others have reported on this for some simple designs and primarily focused on finite population randomization based inference. When designs become more complicated, the randomization based approach becomes increasingly difficult.
Since linear mixed effects models are particularly useful for modeling data from complex designs, their role in modeling treatment heterogeneity is investigated. It is shown that an individual treatment effect can be conceptualized as a linear combination of fixed treatment effects and random effects. The random effects are assumed to have variance components specified in a mixed effects “potential outcomes” model when both potential outcomes, rT, rC, are variables in the model. The variance of the individual causal effect is used to quantify treatment heterogeneity. Post treatment assignment, however, only one of the two potential outcomes is observable for a unit. It is then shown that the variance component for treatment heterogeneity becomes non-estimable in an analysis of observed data. Furthermore, estimable variance components in the observed data model are demonstrated to arise from linear combinations of the non-estimable variance components in the potential outcomes model. Mixed effects models are considered in context of a particular design in an effort to illuminate the loss of information incurred when moving from a potential outcomes framework to an observed data analysis
Cause-Effect Relationships in Analytical Surveys: An Illustration of Statistical Issues
Establishing cause-effect is critical in the field of natural resources where one may want to know the impact of management practices, wildfires, drought, etc. on water quality and quantity, wildlife, growth and survival of desirable trees for timber production, etc. Yet, key obstacles exist when trying to establish cause-effect in such contexts. Issues involved with identifying a causal hypothesis, and conditions needed to estimate a causal effect or to establish cause-effect are considered. Ideally one conducts an experiment and follows with a survey, or vice versa. in an experiment, the population of inference may be quite limited and in surveys, the probability distribution of treatment assignments is generally unknown and, if not accounted for, can cause serious errors when estimating causal effects. the latter is illustrated in simulation experiments of artificially generated forest populations using annual plot mortality as the response, drought as the cause, and age as a covariate that is correlated with mortality. We also consider the role of a vague unobservable covariate such as \u27drought susceptibility\u27. Recommendations are made designed to maximize the possibility of identifying cause-effect relationships in large-scale natural resources surveys
Modern Statistical Methods for Handling Missing Repeated Measurements in Obesity Trial Data: Beyond LOCF
This paper brings together some modern statistical methods to address the problem of missing data in obesity trials with repeated measurements. Such missing data occur when subjects miss one or more follow-up visits or drop out early from an obesity trial. a common approach to dealing with missing data because of dropout is \u27last observation carried forward\u27 (LOCF). This method, although intuitively appealing, requires restrictive assumptions to produce valid statistical conclusions. We review the need for obesity trials, the assumptions that must be made regarding missing data in such trials, and some modern statistical methods for analyzing data containing missing repeated measurements. These modern methods have fewer limitations and less restrictive assumptions than required for LOCE. Moreover, their recent introduction into current releases of statistical software and textbooks makes them more readily available to the applied data analyses
EXPLORATION OF REACTANT-PRODUCT LIPID PAIRS IN MUTANT-WILD TYPE LIPIDOMICS EXPERIMENTS
High-throughput metabolite analysis is very important for biologists to identify the functions of genes. A mutation in a gene encoding an enzyme is expected to alter the level of the metabolites which serve as the enzyme’s reactant(s) (also known as substrate) and product(s). To find the function of a mutated gene, metabolite data from a wild-type organism and a mutant are compared and candidate reactants and products are identified. The screening principle is that the concentration of reactants will be higher and the concentration of products will be lower in the mutant than in wild type. This is because the mutation reduces the reaction between the reactant and the product in the mutant organism. Based upon this principle, we suggest a method to screen metabolite pairs for candidate reactant-product pairs. Metrics are defined that quantify the effect of a mutation on each potential reaction, represented by a metabolite pair. For reactions catalyzed by well-characterized enzymes, one or more biologically functioning reactant-product pairs are known. Knowledge of the functional reactant-product pairs informs the development of the metrics. The goal is for ranking of the metrics for all possible pairs to reflect the likelihood that a particular metabolite pair is a functional reactant-product pair
Randomization Tests for Small Samples: An Application for Genetic Expression Data
An advantage of randomization tests for small samples is that an exact P-value can be computed under an additive model. a disadvantage with very small sample sizes is that the resulting discrete distribution for P-values can make it mathematically impossible for a P-value to attain a particular degree of significance. We investigate a distribution of P-values that arises when several thousand randomization tests are conducted simultaneously using small samples, a situation that arises with microarray gene expression data. We show that the distribution yields valuable information regarding groups of genes that are differentially expressed between two groups: A treatment group and a control group. This distribution helps to categorize genes with varying degrees of overlap of genetic expression values between the two groups, and it helps to quantify the degree of overlap by using the P-value from a randomization test. Moreover, a statistical test is available that compares the actual distribution of P-values with an expected distribution if there are no genes that are differentially expressed. We demonstrate the method and illustrate the results by using a microarray data set involving a cell line for rheumatoid arthritis. a small simulation study evaluates the effect that correlated gene expression levels could have on results from the analysis
The PowerAtlas: a power and sample size atlas for microarray experimental design and research
BACKGROUND: Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies. RESULTS: To address this challenge, we have developed a Microrarray PowerAtlas [1]. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO). The PowerAtlas also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC). CONCLUSION: This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes
HDBStat!: A platform-independent software suite for statistical analysis of high dimensional biology data
BACKGROUND: Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. RESULTS: Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. CONCLUSION: HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website
- …