1,706,184 research outputs found

    Sparse Probit Linear Mixed Model

    Full text link
    Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.Comment: Published version, 21 pages, 6 figure

    A vine copula mixed effect model for trivariate meta-analysis of diagnostic test accuracy studies accounting for disease prevalence

    Get PDF
    A bivariate copula mixed model has been recently proposed to synthesize diagnostic test accuracy studies and it has been shown that it is superior to the standard generalized linear mixed model in this context. Here, we call trivariate vine copulas to extend the bivariate meta-analysis of diagnostic test accuracy studies by accounting for disease prevalence. Our vine copula mixed model includes the trivariate generalized linear mixed model as a special case and can also operate on the original scale of sensitivity, specificity, and disease prevalence. Our general methodology is illustrated by re-analyzing the data of two published meta-analyses. Our study suggests that there can be an improvement on trivariate generalized linear mixed model in fit to data and makes the argument for moving to vine copula random effects models especially because of their richness, including reflection asymmetric tail dependence, and computational feasibility despite their three dimensionality

    Model Selection in Linear Mixed Models

    Full text link
    Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model with other desirable properties from a possibly very large set of candidate statistical models. Over the last 5-10 years the literature on model selection in linear mixed models has grown extremely rapidly. The problem is much more complicated than in linear regression because selection on the covariance structure is not straightforward due to computational issues and boundary problems arising from positive semidefinite constraints on covariance matrices. To obtain a better understanding of the available methods, their properties and the relationships between them, we review a large body of literature on linear mixed model selection. We arrange, implement, discuss and compare model selection methods based on four major approaches: information criteria such as AIC or BIC, shrinkage methods based on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques.Comment: Published in at http://dx.doi.org/10.1214/12-STS410 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data

    Get PDF
    We propose an estimation approach to analyse correlated functional data which are observed on unequal grids or even sparsely. The model we use is a functional linear mixed model, a functional analogue of the linear mixed model. Estimation is based on dimension reduction via functional principal component analysis and on mixed model methodology. Our procedure allows the decomposition of the variability in the data as well as the estimation of mean effects of interest and borrows strength across curves. Confidence bands for mean effects can be constructed conditional on estimated principal components. We provide R-code implementing our approach. The method is motivated by and applied to data from speech production research

    Latitude: A Model for Mixed Linear-Tropical Matrix Factorization

    Full text link
    Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.Comment: 14 pages, 6 figures. To appear in 2018 SIAM International Conference on Data Mining (SDM '18). For the source code, see https://people.mpi-inf.mpg.de/~pmiettin/linear-tropical

    Linear mixed models with endogenous covariates: modeling sequential treatment effects with application to a mobile health study

    Full text link
    Mobile health is a rapidly developing field in which behavioral treatments are delivered to individuals via wearables or smartphones to facilitate health-related behavior change. Micro-randomized trials (MRT) are an experimental design for developing mobile health interventions. In an MRT the treatments are randomized numerous times for each individual over course of the trial. Along with assessing treatment effects, behavioral scientists aim to understand between-person heterogeneity in the treatment effect. A natural approach is the familiar linear mixed model. However, directly applying linear mixed models is problematic because potential moderators of the treatment effect are frequently endogenous---that is, may depend on prior treatment. We discuss model interpretation and biases that arise in the absence of additional assumptions when endogenous covariates are included in a linear mixed model. In particular, when there are endogenous covariates, the coefficients no longer have the customary marginal interpretation. However, these coefficients still have a conditional-on-the-random-effect interpretation. We provide an additional assumption that, if true, allows scientists to use standard software to fit linear mixed model with endogenous covariates, and person-specific predictions of effects can be provided. As an illustration, we assess the effect of activity suggestion in the HeartSteps MRT and analyze the between-person treatment effect heterogeneity

    A Note on the Identifiability of Generalized Linear Mixed Models

    Full text link
    I present here a simple proof that, under general regularity conditions, the standard parametrization of generalized linear mixed model is identifiable. The proof is based on the assumptions of generalized linear mixed models on the first and second order moments and some general mild regularity conditions, and, therefore, is extensible to quasi-likelihood based generalized linear models. In particular, binomial and Poisson mixed models with dispersion parameter are identifiable when equipped with the standard parametrization.Comment: 9 pages, no figure
    corecore