97 research outputs found

    Variable Selection in Additive Models by Nonnegative Garrote

    Get PDF
    We adapt Breiman's (1995) nonnegative garrote method to perform variable selection in nonparametric additive models. The technique avoids methods of testing for which no reliable distributional theory is available. In addition it removes the need for a full search of all possible models, something which is computationally intensive, especially when the number of variables is moderate to high. The method has the advantages of being conceptually simple and computationally fast. It provides accurate predictions and is effective at identifying the variables generating the model. For illustration, we consider both a study of Boston housing prices as well as two simulation settings. In all cases our methods perform as well or better than available alternatives like the Component Selection and Smoothing Operator (COSSO).cross-validation, nonnegative garrote, nonparametric regression, shrinkage methods, variable selection

    Longitudinal variable selection by cross-validation in the case of many covariates

    Get PDF
    Longitudinal models are commonly used for studying data collected on individuals repeatedly through time. While there are now a variety of such models available (Marginal Models, Mixed Effects Models, etc.), far fewer options appear to exist for the closely related issue of variable selection. In addition, longitudinal data typically derive from medical or other large-scale studies where often large numbers of potential explanatory variables and hence even larger numbers of candidate models must be considered. Cross-validation is a popular method for variable selection based on the predictive ability of the model. Here, we propose a cross-validation Markov Chain Monte Carlo procedure as a general variable selection tool which avoids the need to visit all candidate models. Inclusion of a “one-standard error” rule provides users with a collection of good models as is often desired. We demonstrate the effectiveness of our procedure both in a simulation setting and in a real application.

    Variable selection for marginal longitudinal generalized linear models

    Get PDF
    Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this paper we propose a generalized version of Mallows's Cp (GCp) suitable for use with both parametric and nonparametric models. GCp provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GCp

    State-space models' dirty little secrets: even simple linear Gaussian models can have estimation problems

    Get PDF
    State-space models (SSMs) are increasingly used in ecology to model time-series such as animal movement paths and population dynamics. This type of hierarchical model is often structured to account for two levels of variability: biological stochasticity and measurement error. SSMs are flexible. They can model linear and nonlinear processes using a variety of statistical distributions. Recent ecological SSMs are often complex, with a large number of parameters to estimate. Through a simulation study, we show that even simple linear Gaussian SSMs can suffer from parameter- and state-estimation problems. We demonstrate that these problems occur primarily when measurement error is larger than biological stochasticity, the condition that often drives ecologists to use SSMs. Using an animal movement example, we show how these estimation problems can affect ecological inference. Biased parameter estimates of a SSM describing the movement of polar bears (\textit{Ursus maritimus}) result in overestimating their energy expenditure. We suggest potential solutions, but show that it often remains difficult to estimate parameters. While SSMs are powerful tools, they can give misleading results and we urge ecologists to assess whether the parameters can be estimated accurately before drawing ecological conclusions from their results

    Incorporating intra-annual variability in fisheries abundance data to better capture population dynamics

    Get PDF
    To reduce the risk of overexploitation and the ensuing conservation and socio-economic consequences, fisheries management relies on receiving accurate scientific advice from stock assessments. Biomass dynamics models used in stock assessment tend to rely primarily on indices of abundance and commercial landings data. Standard practice for calculating the indices used in these models typically involves taking averages of survey tow data over large, diverse spatial domains. There is a lot of variability in the choice of methodologies used to propagate index uncertainty into the assessment model, many of which require specifying it through expert knowledge or prior distributions. Here we propose an alternative approach that treats each individual survey tow as an independent estimate of the true underlying biomass in the stock assessment model itself. This reduces information loss and propagates uncertainties into the model directly. A simulation study demonstrates that this approach accurately captures underlying population dynamics and reliably estimates variance parameters. We further demonstrate its utility with data from the Inshore Scallop Fishery of south-west Nova Scotia. Results show significant improvements in parameter estimation over previous models while providing similar predictions of biomass with less uncertainty. This reduced uncertainty can improve the resulting scientific advice and lead to improved decision-making by fisheries managers.publishedVersio

    A Hidden Markov Movement Model for rapidly identifying behavioral states from animal tracks

    Get PDF
    Electronic telemetry is frequently used to document animal movement through time. Methods that can identify underlying behaviors driving specific movement patterns can help us understand how and why animals use available space, thereby aiding conservation and management efforts. For aquatic animal tracking data with significant measurement error, a Bayesian state‐space model called the first‐Difference Correlated Random Walk with Switching (DCRWS) has often been used for this purpose. However, for aquatic animals, highly accurate tracking data are now becoming more common. We developed a new hidden Markov model (HMM) for identifying behavioral states from animal tracks with negligible error, called the hidden Markov movement model (HMMM). We implemented as the basis for the HMMM the process equation of the DCRWS, but we used the method of maximum likelihood and the R package TMB for rapid model fitting. The HMMM was compared to a modified version of the DCRWS for highly accurate tracks, the DCRWS [Formula: see text] , and to a common HMM for animal tracks fitted with the R package moveHMM. We show that the HMMM is both accurate and suitable for multiple species by fitting it to real tracks from a grey seal, lake trout, and blue shark, as well as to simulated data. The HMMM is a fast and reliable tool for making meaningful inference from animal movement data that is ideally suited for ecologists who want to use the popular DCRWS implementation and have highly accurate tracking data. It additionally provides a groundwork for development of more complex modeling of animal movement with TMB. To facilitate its uptake, we make it available through the R package swim

    Validation of close‐kin mark–recapture (CKMR) methods for estimating population abundance

    Get PDF
    Under embargo until: 2020-06-181. Knowing how many individuals there are in a population is a fundamental problem in the management and conservation of freshwater and marine fish. We compare abundance estimates (census size, Nc) in seven brook trout Salvelinus fontinalis populations using standard mark–recapture (MR) and the close‐kin mark–recapture (CKMR) method. Our purpose is to validate CKMR as a method for estimating population size. 2. Close‐kin mark–recapture is based on the principle that an individual's genotype can be considered a “recapture” of the genotypes of each of its parents. Assuming offspring and parents are sampled independently, the number of parent–offspring pairs (POPs) genetically identified in these samples can be used to estimate abundance. We genotyped (33 microsatellites) and aged c. 2,400 brook trout individuals collected over 5 consecutive years (2014–2018). 3. We provide an alternative interpretation of CKMR in terms of the Lincoln– Petersen estimator in which the parents are considered as tagging the offspring rather than the offspring “recapturing” the parents. 4. Despite various sources of uncertainty, we find close agreement between standard MR abundance estimates obtained through double‐pass electrofishing and CKMR estimates, which require information on age‐specific fecundity, and population‐ and age‐specific survival rates. Population sizes (N) are estimated to range between 300 and 6,000 adult individuals. Our study constitutes the first in situ validation of CKMR and establishes it as a useful method for estimating population size in aquatic systems where assumptions of random sampling and thorough mixing of individuals can be met.acceptedVersio
    corecore