Search CORE

58 research outputs found

Automatic Search Intervals for the Smoothing Parameter in Penalized Splines

Author: Cao Jiguo
Li Zheyuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/12/2022
Field of study

The selection of smoothing parameter is central to the estimation of penalized splines. The best value of the smoothing parameter is often the one that optimizes a smoothness selection criterion, such as generalized cross-validation error (GCV) and restricted likelihood (REML). To correctly identify the global optimum rather than being trapped in an undesired local optimum, grid search is recommended for optimization. Unfortunately, the grid search method requires a pre-specified search interval that contains the unknown global optimum, yet no guideline is available for providing this interval. As a result, practitioners have to find it by trial and error. To overcome such difficulty, we develop novel algorithms to automatically find this interval. Our automatic search interval has four advantages. (i) It specifies a smoothing parameter range where the associated penalized least squares problem is numerically solvable. (ii) It is criterion-independent so that different criteria, such as GCV and REML, can be explored on the same parameter range. (iii) It is sufficiently wide to contain the global optimum of any criterion, so that for example, the global minimum of GCV and the global maximum of REML can both be identified. (iv) It is computationally cheap compared with the grid search itself, carrying no extra computational burden in practice. Our method is ready to use through our recently developed R package gps (>= version 1.1). It may be embedded in more advanced statistical modeling methods that rely on penalized splines.Comment: R code is available at https://github.com/ZheyuanLi/gps-vignettes/blob/main/gps2.pd

arXiv.org e-Print Archive

Systems Mapping: How to Improve the Genetic Mapping of Complex Traits Through Design Principles of Biological Systems

Author: Cao Jiguo
Huang Zhongwen
Wu Rongling
Publication venue
Publication date: 01/01/2011
Field of study

Background: Every phenotypic trait can be viewed as a “system” in which a group of interconnected componentsfunction synergistically to yield a unified whole. Once a system’s components and their interactions have beendelineated according to biological principles, we can manipulate and engineer functionally relevant components toproduce a desirable system phenotype.Results: We describe a conceptual framework for mapping quantitative trait loci (QTLs) that control complex traitsby treating trait formation as a dynamic system. This framework, called systems mapping, incorporates a system ofdifferential equations that quantifies how alterations of different components lead to the global change of traitdevelopment and function through genes, and provides a quantitative and testable platform for assessing theinterplay between gene action and development. We applied systems mapping to analyze biomass growth data ina mapping population of soybeans and identified specific loci that are responsible for the dynamics of biomasspartitioning to leaves, stem, and roots.Conclusions: We show that systems mapping implemented by design principles of biological systems is quiteversatile for deciphering the genetic machineries for size-shape, structural-functional, sink-source and pleiotropicrelationships underlying plant physiology and development. Systems mapping should enable geneticists to shedlight on the genetic complexity of any biological system in plants and other organisms and predict itsphysiological and pathological states

Simon Fraser University Institutional Repository

Deep Learning with Functional Inputs

Author: Cao Jiguo
Multani Kevin
Thind Barinder
Publication venue
Publication date: 16/06/2020
Field of study

We present a methodology for integrating functional data into deep densely connected feed-forward neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to greater interpretability of the relationship between the covariates and the response relative to conventional neural networks. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying functional weights; these results were confirmed through real applications and simulation studies. A forthcoming R package is developed on top of a popular deep learning library (Keras) allowing for general use of the approach.Comment: 28 pages, 6 figures, submitted to JCG

arXiv.org e-Print Archive

Functional Autoencoder for Smoothing and Representation Learning

Author: Beaulac Cédric
Cao Jiguo
Wu Sidi
Publication venue
Publication date: 17/01/2024
Field of study

A common pipeline in functional data analysis is to first convert the discretely observed data to smooth functions, and then represent the functions by a finite-dimensional vector of coefficients summarizing the information. Existing methods for data smoothing and dimensional reduction mainly focus on learning the linear mappings from the data space to the representation space, however, learning only the linear representations may not be sufficient. In this study, we propose to learn the nonlinear representations of functional data using neural network autoencoders designed to process data in the form it is usually collected without the need of preprocessing. We design the encoder to employ a projection layer computing the weighted inner product of the functional data and functional weights over the observed timestamp, and the decoder to apply a recovery layer that maps the finite-dimensional vector extracted from the functional data back to functional space using a set of predetermined basis functions. The developed architecture can accommodate both regularly and irregularly spaced data. Our experiments demonstrate that the proposed method outperforms functional principal component analysis in terms of prediction and classification, and maintains superior smoothing ability and better computational efficiency in comparison to the conventional autoencoders under both linear and nonlinear settings

arXiv.org e-Print Archive