28,896 research outputs found
Recommended from our members
Relationships between estimated autozygosity and complex traits in the UK Biobank
<div><p>Inbreeding increases the risk of certain Mendelian disorders in humans but may also reduce fitness through its effects on complex traits and diseases. Such inbreeding depression is thought to occur due to increased homozygosity at causal variants that are recessive with respect to fitness. Until recently it has been difficult to amass large enough sample sizes to investigate the effects of inbreeding depression on complex traits using genome-wide single nucleotide polymorphism (SNP) data in population-based samples. Further, it is difficult to infer causation in analyses that relate degree of inbreeding to complex traits because confounding variables (e.g., education) may influence both the likelihood for parents to outbreed and offspring trait values. The present study used runs of homozygosity in genome-wide SNP data in up to 400,000 individuals in the UK Biobank to estimate the proportion of the autosome that exists in autozygous tracts—stretches of the genome which are identical due to a shared common ancestor. After multiple testing corrections and controlling for possible sociodemographic confounders, we found significant relationships in the predicted direction between estimated autozygosity and three of the 26 traits we investigated: age at first sexual intercourse, fluid intelligence, and forced expiratory volume in 1 second. Our findings corroborate those of several published studies. These results may imply that these traits have been associated with Darwinian fitness over evolutionary time. However, some of the autozygosity-trait relationships were attenuated after controlling for background sociodemographic characteristics, suggesting that alternative explanations for these associations have not been eliminated. Care needs to be taken in the design and interpretation of ROH studies in order to glean reliable information about the genetic architecture and evolutionary history of complex traits.</p></div
Regularization Paths for Generalized Linear Models via Coordinate Descent
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multi- nomial regression problems while the penalties include âÂÂ_1 (the lasso), âÂÂ_2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Bayesian Item Response Modeling in R with brms and Stan
Item Response Theory (IRT) is widely applied in the human sciences to model
persons' responses on a set of items measuring one or more latent constructs.
While several R packages have been developed that implement IRT models, they
tend to be restricted to respective prespecified classes of models. Further,
most implementations are frequentist while the availability of Bayesian methods
remains comparably limited. We demonstrate how to use the R package brms
together with the probabilistic programming language Stan to specify and fit a
wide range of Bayesian IRT models using flexible and intuitive multilevel
formula syntax. Further, item and person parameters can be related in both a
linear or non-linear manner. Various distributions for categorical, ordinal,
and continuous responses are supported. Users may even define their own custom
response distribution for use in the presented framework. Common IRT model
classes that can be specified natively in the presented framework include 1PL
and 2PL logistic models optionally also containing guessing parameters, graded
response and partial credit ordinal models, as well as drift diffusion models
of response times coupled with binary decisions. Posterior distributions of
item and person parameters can be conveniently extracted and post-processed.
Model fit can be evaluated and compared using Bayes factors and efficient
cross-validation procedures.Comment: 54 pages, 16 figures, 3 table
High-dimensional estimation with geometric constraints
Consider measuring an n-dimensional vector x through the inner product with
several measurement vectors, a_1, a_2, ..., a_m. It is common in both signal
processing and statistics to assume the linear response model y_i = +
e_i, where e_i is a noise term. However, in practice the precise relationship
between the signal x and the observations y_i may not follow the linear model,
and in some cases it may not even be known. To address this challenge, in this
paper we propose a general model where it is only assumed that each observation
y_i may depend on a_i only through . We do not assume that the
dependence is known. This is a form of the semiparametric single index model,
and it includes the linear model as well as many forms of the generalized
linear model as special cases. We further assume that the signal x has some
structure, and we formulate this as a general assumption that x belongs to some
known (but arbitrary) feasible set K. We carefully detail the benefit of using
the signal structure to improve estimation. The theory is based on the mean
width of K, a geometric parameter which can be used to understand its effective
dimension in estimation problems. We determine a simple, efficient two-step
procedure for estimating the signal based on this model -- a linear estimation
followed by metric projection onto K. We give general conditions under which
the estimator is minimax optimal up to a constant. This leads to the intriguing
conclusion that in the high noise regime, an unknown non-linearity in the
observations does not significantly reduce one's ability to determine the
signal, even when the non-linearity may be non-invertible. Our results may be
specialized to understand the effect of non-linearities in compressed sensing.Comment: This version incorporates minor revisions suggested by referee
Point process-based modeling of multiple debris flow landslides using INLA: an application to the 2009 Messina disaster
We develop a stochastic modeling approach based on spatial point processes of
log-Gaussian Cox type for a collection of around 5000 landslide events provoked
by a precipitation trigger in Sicily, Italy. Through the embedding into a
hierarchical Bayesian estimation framework, we can use the Integrated Nested
Laplace Approximation methodology to make inference and obtain the posterior
estimates. Several mapping units are useful to partition a given study area in
landslide prediction studies. These units hierarchically subdivide the
geographic space from the highest grid-based resolution to the stronger
morphodynamic-oriented slope units. Here we integrate both mapping units into a
single hierarchical model, by treating the landslide triggering locations as a
random point pattern. This approach diverges fundamentally from the unanimously
used presence-absence structure for areal units since we focus on modeling the
expected landslide count jointly within the two mapping units. Predicting this
landslide intensity provides more detailed and complete information as compared
to the classically used susceptibility mapping approach based on relative
probabilities. To illustrate the model's versatility, we compute absolute
probability maps of landslide occurrences and check its predictive power over
space. While the landslide community typically produces spatial predictive
models for landslides only in the sense that covariates are spatially
distributed, no actual spatial dependence has been explicitly integrated so far
for landslide susceptibility. Our novel approach features a spatial latent
effect defined at the slope unit level, allowing us to assess the spatial
influence that remains unexplained by the covariates in the model
Fused kernel-spline smoothing for repeatedly measured outcomes in a generalized partially linear model with functional single index
We propose a generalized partially linear functional single index risk score
model for repeatedly measured outcomes where the index itself is a function of
time. We fuse the nonparametric kernel method and regression spline method, and
modify the generalized estimating equation to facilitate estimation and
inference. We use local smoothing kernel to estimate the unspecified
coefficient functions of time, and use B-splines to estimate the unspecified
function of the single index component. The covariance structure is taken into
account via a working model, which provides valid estimation and inference
procedure whether or not it captures the true covariance. The estimation method
is applicable to both continuous and discrete outcomes. We derive large sample
properties of the estimation procedure and show a different convergence rate
for each component of the model. The asymptotic properties when the kernel and
regression spline methods are combined in a nested fashion has not been studied
prior to this work, even in the independent data case.Comment: Published at http://dx.doi.org/10.1214/15-AOS1330 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …