Quantitative genetic studies that model complex, multivariate phenotypes are
important for both evolutionary prediction and artificial selection. For
example, changes in gene expression can provide insight into developmental and
physiological mechanisms that link genotype and phenotype. However, classical
analytical techniques are poorly suited to quantitative genetic studies of gene
expression where the number of traits assayed per individual can reach many
thousand. Here, we derive a Bayesian genetic sparse factor model for estimating
the genetic covariance matrix (G-matrix) of high-dimensional traits, such as
gene expression, in a mixed effects model. The key idea of our model is that we
need only consider G-matrices that are biologically plausible. An organism's
entire phenotype is the result of processes that are modular and have limited
complexity. This implies that the G-matrix will be highly structured. In
particular, we assume that a limited number of intermediate traits (or factors,
e.g., variations in development or physiology) control the variation in the
high-dimensional phenotype, and that each of these intermediate traits is
sparse -- affecting only a few observed traits. The advantages of this approach
are two-fold. First, sparse factors are interpretable and provide biological
insight into mechanisms underlying the genetic architecture. Second, enforcing
sparsity helps prevent sampling errors from swamping out the true signal in
high-dimensional data. We demonstrate the advantages of our model on simulated
data and in an analysis of a published Drosophila melanogaster gene expression
data set.Comment: 35 pages, 7 figure