2,877 research outputs found
Fitting Linear Mixed-Effects Models using lme4
Maximum likelihood or restricted maximum likelihood (REML) estimates of the
parameters in linear mixed-effects models can be determined using the lmer
function in the lme4 package for R. As for most model-fitting functions in R,
the model is described in an lmer call by a formula, in this case including
both fixed- and random-effects terms. The formula and data together determine a
numerical representation of the model from which the profiled deviance or the
profiled REML criterion can be evaluated as a function of some of the model
parameters. The appropriate criterion is optimized, using one of the
constrained optimization functions in R, to provide the parameter estimates. We
describe the structure of the model, the steps in evaluating the profiled
deviance or REML criterion, and the structure of classes or types that
represents such a model. Sufficient detail is included to allow specialization
of these structures by users who wish to write functions to fit specialized
linear mixed models, such as models incorporating pedigrees or smoothing
splines, that are not easily expressible in the formula language used by lmer.Comment: 51 pages, including R code, and an appendi
Structured penalties for functional linear models---partially empirical eigenvectors for regression
One of the challenges with functional data is incorporating spatial
structure, or local correlation, into the analysis. This structure is inherent
in the output from an increasing number of biomedical technologies, and a
functional linear model is often used to estimate the relationship between the
predictor functions and scalar responses. Common approaches to the ill-posed
problem of estimating a coefficient function typically involve two stages:
regularization and estimation. Regularization is usually done via dimension
reduction, projecting onto a predefined span of basis functions or a reduced
set of eigenvectors (principal components). In contrast, we present a unified
approach that directly incorporates spatial structure into the estimation
process by exploiting the joint eigenproperties of the predictors and a linear
penalty operator. In this sense, the components in the regression are
`partially empirical' and the framework is provided by the generalized singular
value decomposition (GSVD). The GSVD clarifies the penalized estimation process
and informs the choice of penalty by making explicit the joint influence of the
penalty and predictors on the bias, variance, and performance of the estimated
coefficient function. Laboratory spectroscopy data and simulations are used to
illustrate the concepts.Comment: 29 pages, 3 figures, 5 tables; typo/notational errors edited and
intro revised per journal review proces
Inference for High-Dimensional Sparse Econometric Models
This article is about estimation and inference methods for high dimensional
sparse (HDS) regression models in econometrics. High dimensional sparse models
arise in situations where many regressors (or series terms) are available and
the regression function is well-approximated by a parsimonious, yet unknown set
of regressors. The latter condition makes it possible to estimate the entire
regression function effectively by searching for approximately the right set of
regressors. We discuss methods for identifying this set of regressors and
estimating their coefficients based on -penalization and describe key
theoretical results. In order to capture realistic practical situations, we
expressly allow for imperfect selection of regressors and study the impact of
this imperfect selection on estimation and inference results. We focus the main
part of the article on the use of HDS models and methods in the instrumental
variables model and the partially linear model. We present a set of novel
inference results for these models and illustrate their use with applications
to returns to schooling and growth regression
Inferring Gene RegulatoryNetworks from a Populationof Yeast Segregants
Constructing gene regulatory networks is crucial to unraveling the genetic architecture of complex traits and to understanding the mechanisms of diseases. On the basis of gene expression and single nucleotide polymorphism data in the yeast, Saccharomyces cerevisiae, we constructed gene regulatory networks using a two-stage penalized least squares method. A large system of structural equations via optimal prediction of a set of surrogate variables was established at the first stage, followed by consistent selection of regulatory effects at the second stage. Using this approach, we identified subnetworks that were enriched in gene ontology categories, revealing directional regulatory mechanisms controlling these biological pathways. Our mapping and analysis of expression-based quantitative trait loci uncovered a known alteration of gene expression within a biological pathway that results in regulatory effects on companion pathway genes in the phosphocholine network. In addition, we identify nodes in these gene ontology-enriched subnetworks that are coordinately controlled by transcription factors driven by trans-acting expression quantitative trait loci. Altogether, the integration of documented transcription factor regulatory associations with subnetworks defined by a system of structural equations using quantitative trait loci data is an effective means to delineate the transcriptional control of biological pathways
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Inference for high-dimensional sparse econometric models
This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on l1 -penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression.
- …