58 research outputs found
Automatic Search Intervals for the Smoothing Parameter in Penalized Splines
The selection of smoothing parameter is central to the estimation of
penalized splines. The best value of the smoothing parameter is often the one
that optimizes a smoothness selection criterion, such as generalized
cross-validation error (GCV) and restricted likelihood (REML). To correctly
identify the global optimum rather than being trapped in an undesired local
optimum, grid search is recommended for optimization. Unfortunately, the grid
search method requires a pre-specified search interval that contains the
unknown global optimum, yet no guideline is available for providing this
interval. As a result, practitioners have to find it by trial and error. To
overcome such difficulty, we develop novel algorithms to automatically find
this interval. Our automatic search interval has four advantages. (i) It
specifies a smoothing parameter range where the associated penalized least
squares problem is numerically solvable. (ii) It is criterion-independent so
that different criteria, such as GCV and REML, can be explored on the same
parameter range. (iii) It is sufficiently wide to contain the global optimum of
any criterion, so that for example, the global minimum of GCV and the global
maximum of REML can both be identified. (iv) It is computationally cheap
compared with the grid search itself, carrying no extra computational burden in
practice. Our method is ready to use through our recently developed R package
gps (>= version 1.1). It may be embedded in more advanced statistical modeling
methods that rely on penalized splines.Comment: R code is available at
https://github.com/ZheyuanLi/gps-vignettes/blob/main/gps2.pd
Systems Mapping: How to Improve the Genetic Mapping of Complex Traits Through Design Principles of Biological Systems
Background: Every phenotypic trait can be viewed as a “system” in which a group of interconnected componentsfunction synergistically to yield a unified whole. Once a system’s components and their interactions have beendelineated according to biological principles, we can manipulate and engineer functionally relevant components toproduce a desirable system phenotype.Results: We describe a conceptual framework for mapping quantitative trait loci (QTLs) that control complex traitsby treating trait formation as a dynamic system. This framework, called systems mapping, incorporates a system ofdifferential equations that quantifies how alterations of different components lead to the global change of traitdevelopment and function through genes, and provides a quantitative and testable platform for assessing theinterplay between gene action and development. We applied systems mapping to analyze biomass growth data ina mapping population of soybeans and identified specific loci that are responsible for the dynamics of biomasspartitioning to leaves, stem, and roots.Conclusions: We show that systems mapping implemented by design principles of biological systems is quiteversatile for deciphering the genetic machineries for size-shape, structural-functional, sink-source and pleiotropicrelationships underlying plant physiology and development. Systems mapping should enable geneticists to shedlight on the genetic complexity of any biological system in plants and other organisms and predict itsphysiological and pathological states
Deep Learning with Functional Inputs
We present a methodology for integrating functional data into deep densely
connected feed-forward neural networks. The model is defined for scalar
responses with multiple functional and scalar covariates. A by-product of the
method is a set of dynamic functional weights that can be visualized during the
optimization process. This visualization leads to greater interpretability of
the relationship between the covariates and the response relative to
conventional neural networks. The model is shown to perform well in a number of
contexts including prediction of new data and recovery of the true underlying
functional weights; these results were confirmed through real applications and
simulation studies. A forthcoming R package is developed on top of a popular
deep learning library (Keras) allowing for general use of the approach.Comment: 28 pages, 6 figures, submitted to JCG
Functional Autoencoder for Smoothing and Representation Learning
A common pipeline in functional data analysis is to first convert the
discretely observed data to smooth functions, and then represent the functions
by a finite-dimensional vector of coefficients summarizing the information.
Existing methods for data smoothing and dimensional reduction mainly focus on
learning the linear mappings from the data space to the representation space,
however, learning only the linear representations may not be sufficient. In
this study, we propose to learn the nonlinear representations of functional
data using neural network autoencoders designed to process data in the form it
is usually collected without the need of preprocessing. We design the encoder
to employ a projection layer computing the weighted inner product of the
functional data and functional weights over the observed timestamp, and the
decoder to apply a recovery layer that maps the finite-dimensional vector
extracted from the functional data back to functional space using a set of
predetermined basis functions. The developed architecture can accommodate both
regularly and irregularly spaced data. Our experiments demonstrate that the
proposed method outperforms functional principal component analysis in terms of
prediction and classification, and maintains superior smoothing ability and
better computational efficiency in comparison to the conventional autoencoders
under both linear and nonlinear settings
- …