772 research outputs found

    Vol. 15, No. 1 (Full Issue)

    Get PDF

    A Bayes Linear Analysis of Multilevel Models

    Get PDF
    In this thesis, Bayes Linear methods for modeling multilevel data are presented and discussed. Second-order exchangeability judgements are exploited to formulate subjectivist versions of multilevel models. Bayes linear methods are applied to estimate model parameters and for diagnostic checks. Closed-form expressions of estimators are derived, allowing insight into relationships between the quantities thereof. The canonical analysis and resolution transforms are used to guide sample design and sample size determination under cost constraints. A finite version of a multilevel model is formulated, analysed and compared to infinite versions, giving further insight into sample design issues via the finite resolution transform. A new Bayes Linear Minimum Variance Estimation (BLIMVE) approach is de- veloped to estimate variances. Estimated variances are used to perform two-stage Bayes linear analysis of more complex multilevel models. The methods developed are shown to be applicable in cases of small level-2 samples. The Bayes linear analy- ses of multilevel models are applied to an educational data set using special-purpose codes written in the R Statistical Language

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Statistical Models to Assess Associations between the Built Environment and Health: Examining Food Environment Contributions to the Childhood Obesity Epidemic.

    Full text link
    Models are developed and applied to examine the associations between built environment features and health. These developments are motivated by studies examining the contribution of features of the built food environment near schools, such as availability of fast food restaurants and convenience stores, to children’s body weight. The data used in this dissertation come from a surveillance database that captures body weight and other characteristics for all children in 5th, 7th, and 9th grades enrolled in public schools in California during 2001-2010 and a commercial data source that contains the locations of all food establishments in California for the same time period. First, we develop a hierarchical multiple informants model (HMIM) for clustered data that estimates the marginal association of multiple built environment features and formally tests if the strength of their association differs with the outcome. Using this new model, we establish that the contribution of the availability of convenience stores to children’s body mass index z-scores (BMIz) is stronger than that of fast food restaurants. Second, we propose to use a distributed lag model (DLM) to examine whether and how the association between the number of convenience stores and children’s BMIz decays with longer distance from schools. In this model, distributed lag (DL) covariates are the number of convenience stores within several contiguous “ring”-shaped areas from schools rather than circular buffers, and their coefficients are modeled as a function of distance, using smoothing splines. We find that associations are stronger with closer proximity to schools and vanish by about 2 miles from school locations. Third, we develop a hierarchical distributed lag model (HDLM) to systematically examine the variability of the built environment association across regions to help address a yet unanswered question in the built environment literature: whether and how activity spaces relevant to health vary across regions. We find DL coefficients vary across regions, implying that variation in activity spaces also exists. We also identify areas where children’s BMIz is more vulnerable to built environment factors. This dissertation provides novel methods with which to study how built environment factors affect health.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110362/1/jongguri_1.pd

    Vol. 16, No. 2 (Full Issue)

    Get PDF

    Meta-analysis strategies for heterogeneous studies in genome-wide association studies

    Full text link
    Meta-analysis is a statistical technique that combines results from multiple independent studies to make inferences about parameters of interest. Although it is popular for parameter estimation and hypothesis testing, meta-analytic approaches that incorporate heterogeneous studies have not been fully developed. For heterogeneous studies, we do not expect all of the studies to have the same true underlying effect and the use of the fixed-effects model in a meta-analysis in this situation violates the assumption of homogeneity of effect size. Heterogeneity among studies can arise from multiple sources such as differences in populations by ancestry, differences in study designs, and different impacts of environmental exposures on the effect of the variable of interest. In this thesis, we introduce an analytic strategy and statistical models for meta-analysis of potentially heterogeneous studies. First, we propose a two-stage clustering approach to account for heterogeneity in trans-ethnic meta-analysis of genome-wide association studies (GWAS). Specifically, we cluster studies in the two-stage approach using cohort-specific genetic information prior to meta-analysis to account for between-cluster heterogeneity as well as to bolster within-cluster homogeneity. An extensive simulation study shows that this approach improves power and diminishes computational intensity compared to existing methods for trans-ethnic meta-analysis. Next, under a meta-regression framework, we develop a likelihood ratio test (LRT) statistic to accommodate multiple random effects. We allow multiple sources of heterogeneity in terms of study characteristics and model the heterogeneities as random effects. We show that the proposed LRT maintains a similar or higher power than other existing methods in a simulation study especially when heterogeneity exists. We apply this new approach to meta-analyze genome-wide association data. Lastly, we derive a score test in the same context as our proposed new LRT and show the substantial advantage of the score test in computational efficiency compared to the new LRT. The introduced strategy and methodologies can effectively and efficiently aggregate the evidence from potentially heterogeneous studies in statistical genetics and other research areas

    Vol. 16, No. 1 (Full Issue)

    Get PDF

    Population-averaged models with GEE analysis in CRTs, focusing on power calculation under incomplete design, count outcomes, and computing tools with finite sample adjustments.

    Get PDF
    Cluster randomized trials (CRTs) are studies designed to test interventions that operate at a group level, differing from randomized controlled trials which test intervention effects on the individual level. Outcomes are more similar between individuals in the same group than in different groups, which is referred to as clustering effects and described by intracluster correlation coefficients (ICCs). There are often a limited number of groups in CRTs. Population-averaged models with generalized estimating equations (GEE) analysis describe how the average response changes with the flexible specification of correlation structures and explanations of intervention effects and ICCs at the population level. When there is a special scientific interest in ICCs with a small number of clusters, the GEE/MAEE (matrix-adjusted estimating equations) approach is suggested to reduce the bias of ICC estimates. In the first project, we propose a fast and computationally efficient, non-simulation-based, power calculation method for GEE analysis of complete and incomplete multi-period CRTs. Simulations suggest the fast GEE power method reliably predicts empirical power with GEE/MAEE for incomplete stepped wedge CRTs with a small number of clusters. Moreover, we implement the fast GEE power method for multi-period CRTs into an open-access SAS macro. In the second project, we extend the method of GEE/MAEE analysis by adding a third estimating equation for the scale parameter and evaluate the performance of the three estimating equations (3EE) approach for correlated over-dispersed count outcomes in CRTs with a small number of clusters. Simulations demonstrate the superior performance of the 3EE/MAEE approach in reducing the bias of ICC estimates and maintaining the nominal coverage of confidence intervals compared to its uncorrected counterpart. The performance of different bias-corrected sandwich variance estimators is also evaluated. The methodology is used to analyze over-dispersed correlated count outcomes in a real-world stepped wedge CRT. In the third project, we develop a SAS macro GEEMAEE for the analysis of clustered binary, count, and continuous data based on GEE/MAEE approach. Deletion diagnostics are available to estimate the influence of observations, cluster-periods, and clusters on regression parameter estimates. The macro also provides bias-corrected covariance estimators for both marginal mean and correlation parameters.Doctor of Philosoph

    Bayesian unit-level modeling of non-Gaussian survey data under informative sampling with application to small area estimation

    Get PDF
    Unit-level models are an alternative to the traditional area-level models used in small area estimation, characterized by the direct modeling of survey responses rather than aggregated direct estimates. These unit-level approaches offer many benefits over area-level modeling, such as potential for more precise estimates, construction of estimates at multiple spatial resolutions through a single model, and elimination of the need for benchmarking techniques, among others. Furthermore, many recent surveys collect interesting and complex data types at the unit level, such as text and functional data. Yet, unit-level models present two primary challenges that have limited their widespread use. First, when surveys have been sampled in an informative manner, it is critical to account for the design in some fashion when utilizing a model at the unit level. Second, unit-level datasets are inherently much larger than area-level ones, with responses that are typically non-Gaussian, leading to computational constraints. After providing a comprehensive review on the problem of informative sampling, this dissertation provides four computationally efficient methodologies for non-Gaussian survey data under informative sampling. This methodology relies on the Bayesian pseudo-likelihood to adjust for the survey design, as well as Bayesian hierarchical modeling to characterize various dependence structures. First, a count data model is developed and applied to small area estimation of housing vacancies. Second, modeling approaches for both binary and categorical data are developed, along with a variational Bayes procedure that may be used in extremely high-dimensional settings. This approach is applied to the problem of small area estimation of health insurance rates using the American Community Survey. Third, a nonlinear model is developed to allow for complex covariates, with application to text data contained within the American National Election Studies. Finally, a model is developed for functional covariates and applied to physical activity monitor data from the National Health and Nutrition Examination Survey.Includes bibliographical references
    • …
    corecore