772 research outputs found
A Bayes Linear Analysis of Multilevel Models
In this thesis, Bayes Linear methods for modeling multilevel data are presented
and discussed. Second-order exchangeability judgements are exploited to formulate
subjectivist versions of multilevel models. Bayes linear methods are applied to
estimate model parameters and for diagnostic checks. Closed-form expressions of
estimators are derived, allowing insight into relationships between the quantities
thereof. The canonical analysis and resolution transforms are used to guide sample
design and sample size determination under cost constraints. A finite version of a
multilevel model is formulated, analysed and compared to infinite versions, giving
further insight into sample design issues via the finite resolution transform.
A new Bayes Linear Minimum Variance Estimation (BLIMVE) approach is de-
veloped to estimate variances. Estimated variances are used to perform two-stage
Bayes linear analysis of more complex multilevel models. The methods developed
are shown to be applicable in cases of small level-2 samples. The Bayes linear analy-
ses of multilevel models are applied to an educational data set using special-purpose
codes written in the R Statistical Language
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Statistical Models to Assess Associations between the Built Environment and Health: Examining Food Environment Contributions to the Childhood Obesity Epidemic.
Models are developed and applied to examine the associations between built environment features and health. These developments are motivated by studies examining the contribution of features of the built food environment near schools, such as availability of fast food restaurants and convenience stores, to children’s body weight. The data used in this dissertation come from a surveillance database that captures body weight and other characteristics for all children in 5th, 7th, and 9th grades enrolled in public schools in California during 2001-2010 and a commercial data source that contains the locations of all food establishments in California for the same time period. First, we develop a hierarchical multiple informants model (HMIM) for clustered data that estimates the marginal association of multiple built environment features and formally tests if the strength of their association differs with the outcome. Using this new model, we establish that the contribution of the availability of convenience stores to children’s body mass index z-scores (BMIz) is stronger than that of fast food restaurants. Second, we propose to use a distributed lag model (DLM) to examine whether and how the association between the number of convenience stores and children’s BMIz decays with longer distance from schools. In this model, distributed lag (DL) covariates are the number of convenience stores within several contiguous “ring”-shaped areas from schools rather than circular buffers, and their coefficients are modeled as a function of distance, using smoothing splines. We find that associations are stronger with closer proximity to schools and vanish by about 2 miles from school locations. Third, we develop a hierarchical distributed lag model (HDLM) to systematically examine the variability of the built environment association across regions to help address a yet unanswered question in the built environment literature: whether and how activity spaces relevant to health vary across regions. We find DL coefficients vary across regions, implying that variation in activity spaces also exists. We also identify areas where children’s BMIz is more vulnerable to built environment factors. This dissertation provides novel methods with which to study how built environment factors affect health.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110362/1/jongguri_1.pd
Meta-analysis strategies for heterogeneous studies in genome-wide association studies
Meta-analysis is a statistical technique that combines results from multiple independent studies to make inferences about parameters of interest. Although it is popular for parameter estimation and hypothesis testing, meta-analytic approaches that incorporate heterogeneous studies have not been fully developed. For heterogeneous studies, we do not expect all of the studies to have the same true underlying effect and the use of the fixed-effects model in a meta-analysis in this situation violates the assumption of homogeneity of effect size. Heterogeneity among studies can arise from multiple sources such as differences in populations by ancestry, differences in study designs, and different impacts of environmental exposures on the effect of the variable of interest. In this thesis, we introduce an analytic strategy and statistical models for meta-analysis of potentially heterogeneous studies. First, we propose a two-stage clustering approach to account for heterogeneity in trans-ethnic meta-analysis of genome-wide association studies (GWAS). Specifically, we cluster studies in the two-stage approach using cohort-specific genetic information prior to meta-analysis to account for between-cluster heterogeneity as well as to bolster within-cluster homogeneity. An extensive simulation study shows that this approach improves power and diminishes computational intensity compared to existing methods for trans-ethnic meta-analysis. Next, under a meta-regression framework, we develop a likelihood ratio test (LRT) statistic to accommodate multiple random effects. We allow multiple sources of heterogeneity in terms of study characteristics and model the heterogeneities as random effects. We show that the proposed LRT maintains a similar or higher power than other existing methods in a simulation study especially when heterogeneity exists. We apply this new approach to meta-analyze genome-wide association data. Lastly, we derive a score test in the same context as our proposed new LRT and show the substantial advantage of the score test in computational efficiency compared to the new LRT. The introduced strategy and methodologies can effectively and efficiently aggregate the evidence from potentially heterogeneous studies in statistical genetics and other research areas
Population-averaged models with GEE analysis in CRTs, focusing on power calculation under incomplete design, count outcomes, and computing tools with finite sample adjustments.
Cluster randomized trials (CRTs) are studies designed to test interventions that operate at a group level, differing from randomized controlled trials which test intervention effects on the individual level. Outcomes are more similar between individuals in the same group than in different groups, which is referred to as clustering effects and described by intracluster correlation coefficients (ICCs). There are often a limited number of groups in CRTs. Population-averaged models with generalized estimating equations (GEE) analysis describe how the average response changes with the flexible specification of correlation structures and explanations of intervention effects and ICCs at the population level. When there is a special scientific interest in ICCs with a small number of clusters, the GEE/MAEE (matrix-adjusted estimating equations) approach is suggested to reduce the bias of ICC estimates. In the first project, we propose a fast and computationally efficient, non-simulation-based, power calculation method for GEE analysis of complete and incomplete multi-period CRTs. Simulations suggest the fast GEE power method reliably predicts empirical power with GEE/MAEE for incomplete stepped wedge CRTs with a small number of clusters. Moreover, we implement the fast GEE power method for multi-period CRTs into an open-access SAS macro. In the second project, we extend the method of GEE/MAEE analysis by adding a third estimating equation for the scale parameter and evaluate the performance of the three estimating equations (3EE) approach for correlated over-dispersed count outcomes in CRTs with a small number of clusters. Simulations demonstrate the superior performance of the 3EE/MAEE approach in reducing the bias of ICC estimates and maintaining the nominal coverage of confidence intervals compared to its uncorrected counterpart. The performance of different bias-corrected sandwich variance estimators is also evaluated. The methodology is used to analyze over-dispersed correlated count outcomes in a real-world stepped wedge CRT. In the third project, we develop a SAS macro GEEMAEE for the analysis of clustered binary, count, and continuous data based on GEE/MAEE approach. Deletion diagnostics are available to estimate the influence of observations, cluster-periods, and clusters on regression parameter estimates. The macro also provides bias-corrected covariance estimators for both marginal mean and correlation parameters.Doctor of Philosoph
Bayesian unit-level modeling of non-Gaussian survey data under informative sampling with application to small area estimation
Unit-level models are an alternative to the traditional area-level models used in small area estimation, characterized by the direct modeling of survey responses rather than aggregated direct estimates. These unit-level approaches offer many benefits over area-level modeling, such as potential for more precise estimates, construction of estimates at multiple spatial resolutions through a single model, and elimination of the need for benchmarking techniques, among others. Furthermore, many recent surveys collect interesting and complex data types at the unit level, such as text and functional data. Yet, unit-level models present two primary challenges that have limited their widespread use. First, when surveys have been sampled in an informative manner, it is critical to account for the design in some fashion when utilizing a model at the unit level. Second, unit-level datasets are inherently much larger than area-level ones, with responses that are typically non-Gaussian, leading to computational constraints. After providing a comprehensive review on the problem of informative sampling, this dissertation provides four computationally efficient methodologies for non-Gaussian survey data under informative sampling. This methodology relies on the Bayesian pseudo-likelihood to adjust for the survey design, as well as Bayesian hierarchical modeling to characterize various dependence structures. First, a count data model is developed and applied to small area estimation of housing vacancies. Second, modeling approaches for both binary and categorical data are developed, along with a variational Bayes procedure that may be used in extremely high-dimensional settings. This approach is applied to the problem of small area estimation of health insurance rates using the American Community Survey. Third, a nonlinear model is developed to allow for complex covariates, with application to text data contained within the American National Election Studies. Finally, a model is developed for functional covariates and applied to physical activity monitor data from the National Health and Nutrition Examination Survey.Includes bibliographical references
- …