8,642 research outputs found

    Modeling Binary Time Series Using Gaussian Processes with Application to Predicting Sleep States

    Full text link
    Motivated by the problem of predicting sleep states, we develop a mixed effects model for binary time series with a stochastic component represented by a Gaussian process. The fixed component captures the effects of covariates on the binary-valued response. The Gaussian process captures the residual variations in the binary response that are not explained by covariates and past realizations. We develop a frequentist modeling framework that provides efficient inference and more accurate predictions. Results demonstrate the advantages of improved prediction rates over existing approaches such as logistic regression, generalized additive mixed model, models for ordinal data, gradient boosting, decision tree and random forest. Using our proposed model, we show that previous sleep state and heart rates are significant predictors for future sleep states. Simulation studies also show that our proposed method is promising and robust. To handle computational complexity, we utilize Laplace approximation, golden section search and successive parabolic interpolation. With this paper, we also submit an R-package (HIBITS) that implements the proposed procedure.Comment: Journal of Classification (2018

    Used-habitat calibration plots: a new procedure for validating species distribution, resource selection, and step-selection models

    Get PDF
    “Species distribution modeling” was recently ranked as one of the top five “research fronts” in ecology and the environmental sciences by ISI's Essential Science Indicators (Renner and Warton 2013), reflecting the importance of predicting how species distributions will respond to anthropogenic change. Unfortunately, species distribution models (SDMs) often perform poorly when applied to novel environments. Compounding on this problem is the shortage of methods for evaluating SDMs (hence, we may be getting our predictions wrong and not even know it). Traditional methods for validating SDMs quantify a model's ability to classify locations as used or unused. Instead, we propose to focus on how well SDMs can predict the characteristics of used locations. This subtle shift in viewpoint leads to a more natural and informative evaluation and validation of models across the entire spectrum of SDMs. Through a series of examples, we show how simple graphical methods can help with three fundamental challenges of habitat modeling: identifying missing covariates, non-linearity, and multicollinearity. Identifying habitat characteristics that are not well-predicted by the model can provide insights into variables affecting the distribution of species, suggest appropriate model modifications, and ultimately improve the reliability and generality of conservation and management recommendations

    Bayesian semiparametric analysis for two-phase studies of gene-environment interaction

    Full text link
    The two-phase sampling design is a cost-efficient way of collecting expensive covariate information on a judiciously selected subsample. It is natural to apply such a strategy for collecting genetic data in a subsample enriched for exposure to environmental factors for gene-environment interaction (G x E) analysis. In this paper, we consider two-phase studies of G x E interaction where phase I data are available on exposure, covariates and disease status. Stratified sampling is done to prioritize individuals for genotyping at phase II conditional on disease and exposure. We consider a Bayesian analysis based on the joint retrospective likelihood of phases I and II data. We address several important statistical issues: (i) we consider a model with multiple genes, environmental factors and their pairwise interactions. We employ a Bayesian variable selection algorithm to reduce the dimensionality of this potentially high-dimensional model; (ii) we use the assumption of gene-gene and gene-environment independence to trade off between bias and efficiency for estimating the interaction parameters through use of hierarchical priors reflecting this assumption; (iii) we posit a flexible model for the joint distribution of the phase I categorical variables using the nonparametric Bayes construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009) 1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore