594 research outputs found
Stability-mediated epistasis constrains the evolution of an influenza protein.
John Maynard Smith compared protein evolution to the game where one word is converted into another a single letter at a time, with the constraint that all intermediates are words: WORD→WORE→GORE→GONE→GENE. In this analogy, epistasis constrains evolution, with some mutations tolerated only after the occurrence of others. To test whether epistasis similarly constrains actual protein evolution, we created all intermediates along a 39-mutation evolutionary trajectory of influenza nucleoprotein, and also introduced each mutation individually into the parent. Several mutations were deleterious to the parent despite becoming fixed during evolution without negative impact. These mutations were destabilizing, and were preceded or accompanied by stabilizing mutations that alleviated their adverse effects. The constrained mutations occurred at sites enriched in T-cell epitopes, suggesting they promote viral immune escape. Our results paint a coherent portrait of epistasis during nucleoprotein evolution, with stabilizing mutations permitting otherwise inaccessible destabilizing mutations which are sometimes of adaptive value. DOI:http://dx.doi.org/10.7554/eLife.00631.001
Sex, lies and self-reported counts: Bayesian mixture models for heaping in longitudinal count data via birth-death processes
Surveys often ask respondents to report nonnegative counts, but respondents
may misremember or round to a nearby multiple of 5 or 10. This phenomenon is
called heaping, and the error inherent in heaped self-reported numbers can bias
estimation. Heaped data may be collected cross-sectionally or longitudinally
and there may be covariates that complicate the inferential task. Heaping is a
well-known issue in many survey settings, and inference for heaped data is an
important statistical problem. We propose a novel reporting distribution whose
underlying parameters are readily interpretable as rates of misremembering and
rounding. The process accommodates a variety of heaping grids and allows for
quasi-heaping to values nearly but not equal to heaping multiples. We present a
Bayesian hierarchical model for longitudinal samples with covariates to infer
both the unobserved true distribution of counts and the parameters that control
the heaping process. Finally, we apply our methods to longitudinal
self-reported counts of sex partners in a study of high-risk behavior in
HIV-positive youth.Comment: Published at http://dx.doi.org/10.1214/15-AOAS809 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge
This paper develops a new scalable sparse Cox regression tool for sparse
high-dimensional massive sample size (sHDMSS) survival data. The method is a
local -penalized Cox regression via repeatedly performing reweighted
-penalized Cox regression. We show that the resulting estimator enjoys the
best of - and -penalized Cox regressions while overcoming their
limitations. Specifically, the estimator is selection consistent, oracle for
parameter estimation, and possesses a grouping property for highly correlated
covariates. Simulation results suggest that when the sample size is large, the
proposed method with pre-specified tuning parameters has a comparable or better
performance than some popular penalized regression methods. More importantly,
because the method naturally enables adaptation of efficient algorithms for
massive -penalized optimization and does not require costly data driven
tuning parameter selection, it has a significant computational advantage for
sHDMSS data, offering an average of 5-fold speedup over its closest competitor
in empirical studies
- …