225,962 research outputs found

    GENERALIZED LINEAR MIXED MODELS - AN OVERVIEW

    Get PDF
    Generalized linear models provide a methodology for doing regression and ANOV A-type analysis with data whose errors are not necessarily normally-distributed. Common applications in agriculture include categorical data, survival analysis, bioassay, etc. Most of the literature and most of the available computing software for generalized linear models applies to cases in which all model effects are fixed. However, many agricultural research applications lead to mixed or random effects models: split-plot experiments, animal- and plant-breeding studies, multi-location studies, etc. Recently, through a variety of efforts in a number of contexts, a general framework for generalized linear models with random effects, the generalized linear mixed model, has been developed . The purpose of this presentation is to present an overview of the methodology for generalized mixed linear models. Relevant background, estimating equations, and general approaches to interval estimation and hypothesis testing will be presented. Methods will be illustrated via a small data set involving binary data

    Analysis of oligonucleotide array experiments with repeated measures using mixed models

    Get PDF
    BACKGROUND: Two or more factor mixed factorial experiments are becoming increasingly common in microarray data analysis. In this case study, the two factors are presence (Patients with Alzheimer's disease) or absence (Control) of the disease, and brain regions including olfactory bulb (OB) or cerebellum (CER). In the design considered in this manuscript, OB and CER are repeated measurements from the same subject and, hence, are correlated. It is critical to identify sources of variability in the analysis of oligonucleotide array experiments with repeated measures and correlations among data points have to be considered. In addition, multiple testing problems are more complicated in experiments with multi-level treatments or treatment combinations. RESULTS: In this study we adopted a linear mixed model to analyze oligonucleotide array experiments with repeated measures. We first construct a generalized F test to select differentially expressed genes. The Benjamini and Hochberg (BH) procedure of controlling false discovery rate (FDR) at 5% was applied to the P values of the generalized F test. For those genes with significant generalized F test, we then categorize them based on whether the interaction terms were significant or not at the α-level (α(new )= 0.0033) determined by the FDR procedure. Since simple effects may be examined for the genes with significant interaction effect, we adopt the protected Fisher's least significant difference test (LSD) procedure at the level of α(new )to control the family-wise error rate (FWER) for each gene examined. CONCLUSIONS: A linear mixed model is appropriate for analysis of oligonucleotide array experiments with repeated measures. We constructed a generalized F test to select differentially expressed genes, and then applied a specific sequence of tests to identify factorial effects. This sequence of tests applied was designed to control for gene based FWER

    MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package

    Get PDF
    Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in closed form. Markov chain Monte Carlo methods solve this problem by sampling from a series of simpler conditional distributions that can be evaluated. The R package MCMCglmm implements such an algorithm for a range of model fitting problems. More than one response variable can be analyzed simultaneously, and these variables are allowed to follow Gaussian, Poisson, multi(bi)nominal, exponential, zero-inflated and censored distributions. A range of variance structures are permitted for the random effects, including interactions with categorical or continuous variables (i.e., random regression), and more complicated variance structures that arise through shared ancestry, either through a pedigree or through a phylogeny. Missing values are permitted in the response variable(s) and data can be known up to some level of measurement error as in meta-analysis. All simu- lation is done in C/ C++ using the CSparse library for sparse linear systems.

    Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Growing interest on biological pathways has called for new statistical methods for modeling and testing a genetic pathway effect on a health outcome. The fact that genes within a pathway tend to interact with each other and relate to the outcome in a complicated way makes nonparametric methods more desirable. The kernel machine method provides a convenient, powerful and unified method for multi-dimensional parametric and nonparametric modeling of the pathway effect.</p> <p>Results</p> <p>In this paper we propose a logistic kernel machine regression model for binary outcomes. This model relates the disease risk to covariates parametrically, and to genes within a genetic pathway parametrically or nonparametrically using kernel machines. The nonparametric genetic pathway effect allows for possible interactions among the genes within the same pathway and a complicated relationship of the genetic pathway and the outcome. We show that kernel machine estimation of the model components can be formulated using a logistic mixed model. Estimation hence can proceed within a mixed model framework using standard statistical software. A score test based on a Gaussian process approximation is developed to test for the genetic pathway effect. The methods are illustrated using a prostate cancer data set and evaluated using simulations. An extension to continuous and discrete outcomes using generalized kernel machine models and its connection with generalized linear mixed models is discussed.</p> <p>Conclusion</p> <p>Logistic kernel machine regression and its extension generalized kernel machine regression provide a novel and flexible statistical tool for modeling pathway effects on discrete and continuous outcomes. Their close connection to mixed models and attractive performance make them have promising wide applications in bioinformatics and other biomedical areas.</p

    Identification of genomic factors using family-based association studies

    Get PDF
    Genome-wide association studies become increasingly popular and important for detecting genetic associations of complex traits. However, it is well known that spurious associations could arise from statistical analysis without proper consideration of genetic relatedness of samples. Many methods have been proposed to guard against these spurious associations. Here we focus on multi-locus association studies of quantitative traits and the case-control status, and propose algorithms that take into consideration of genetic related samples to address possible confounding issues. As supervised dimension reduction methods, these algorithms performs well to conduct association studies with a large number of biomarkers but a relative small number of samples.^ Recently, Linear mixed models have demonstrated its efficiency in GWAS of quantitative traits with multiple levels of sample structures. Most of the current mixed model based methods such as EMMA, EMMAX, and GEMMA, can be viewed as single-locus methods by testing each SNP separately. Complex traits, however, are known to be controlled by multiple loci, thus including multiple loci in the statistical model seems more appropriate. In the first part of my dissertation, we propose an algorithm that extends penalized orthogonal component regression to family-based association studies (fPOCRE) of continuous traits. While multiple loci can be investigated at the same time, the sample relatedness is modeled through the kinship matrix and the shared confounding effects are included as random effects in the linear mixed model. Our proposed algorithm simultaneously selects biomarkers and constructs their linear combinations as components which optimally account for variation in traits. We compare fPOCRE with EMMAX, which is one of the most frequently used single-locus approach, and also compare it with MLMM, a recently developed multi-locus approach. Our simulation study demonstrates fPOCRE has promising performance over both EMMAX and MLMM in terms of higher power and fewer false positives when causal effects are from clusters of correlated SNPs. Real data are analyzed to illustrate the proposed approach and provide further comparisons.^ Case-control association study is a widely used study design in genetic epidemiology and pharmacology and this study design is also susceptible to the potential confounding by sample structure. In the second part of my dissertation, we employ a multi-locus generalized estimation equation (GEE) model to study genetic associations of binary traits, capturing multiple levels of the sample structure with working correlation matrix. The kinship matrix is used to model the working correlation matrix, and the penalized orthogonal-components regression method is developed to build such a multi-locus GEE model (aka GEE-POCRE). GEE-POCRE is compared with gPOCRE, a multi-locus method that does not consider pedigree information, also compared with TDT, FBAT, and ROADTRIPS that are single-locus methods considering sample structure. In our simulation studies, GEE-POCRE demonstrates good performance in terms of protecting against spurious associations caused by the sample structure as well as having increased power

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Seasonal changes in dry matter yield from Karst pastures as influenced by morphoclimatic features

    Get PDF
    Pastures are strongly affected by local environmental variables in terms of their species richness, plant composition and herbage production. A multi-site monitoring study was conducted over three years to investigate the influence of morphoclimatic factors on the seasonal variations in dry matter (DM) yield from Karst pastures. Seven sites located on the Italian and Slovenian Karst regions were investigated that differed in terms of their geological and geomorphological features, as well as their soil types. At each site, the daily DM yield (kg ha-1 d-1) was determined using Corral-Fenlon method which permits to simulate herbage utilization from grazing herds. The morphoclimatic features were also analysed, with the aim to evaluate the link between seasonal DM yield and geomorphological and environmental factors. Generalized non-linear mixed models were built to study the observed seasonal variations in DM yield, using day of the year (DOY), growing degree days (GDD), and cumulative rainfall. Furthermore, environmental descriptors were included in the model in order to evaluate their effects on DM yield. The seasonal variations in yield showed two growing periods (spring and late summer), which were described by Gaussian curves. For the spring growing period, the model improved when the interaction between soil granulometry and growing degree days corresponding to the curve peak was taken into account. This confirms the influence of soil type and air temperature on pasture yield. For the late summer growing period, the interaction between the sand classes and the number of rainy days from the beginning of the period to the peak of the curve improved the model. The curve parameters of our models are correlated with environmental descriptors depending on the lithology and particle size of soils. The results are essential for the optimization of pasture management and avoiding degradation due to over- or under-grazing
    • …
    corecore