12 research outputs found
New Modeling Approaches Based on Varimax Rotation of Functional Principal Components
Functional Principal Component Analysis (FPCA) is an important dimension reduction
technique to interpret themainmodes of functional data variation in terms of a small set of uncorrelated
variables. The principal components can not always be simply interpreted and rotation is one of the main
solutions to improve the interpretation. In this paper, two new functional Varimax rotation approaches
are introduced. They are based on the equivalence between FPCA of basis expansion of the sample
curves and Principal Component Analysis (PCA) of a transformation of thematrix of basis coefficients.
The first approach consists of a rotation of the eigenvectors that preserves the orthogonality between the
eigenfunctions but the rotated principal component scores are not uncorrelated. The second approach is
based on rotation of the loadings of the standardized principal component scores that provides uncorrelated
rotated scores but non-orthogonal eigenfunctions. A simulation study and an application with data from
the curves of infections by COVID-19 pandemic in Spain are developed to study the performance of these
methods by comparing the results with other existing approaches.Spanish Ministry of Science, Innovation and Universities (FEDER program)
MTM2017-88708-PGovernment of Andalusia (Spain)
FQM-307
FPU18/0177
Bayesian Analysis of Multivariate Matched Proportions with Sparse Response
Multivariate matched proportions (MMP) data appears in a variety of contexts
including post-market surveillance of adverse events in pharmaceuticals,
disease classification, and agreement between care providers. It consists of
multiple sets of paired binary measurements taken on the same subject. While
recent work proposes non-Bayesian methods to address the complexities of MMP
data, the issue of sparse response, where no or very few "yes" responses are
recorded for one or more sets, is unaddressed. The presence of sparse response
sets results in underestimates of variance, loss of coverage, and lowered power
in existing methods. Bayesian methods have not previously been considered for
MMP data but provide a useful framework when sparse responses are present. In
particular, the Bayesian probit model provides an elegant solution to the
problem of variance underestimation. We examine three approaches built on that
model: a naive analysis with flat priors, a penalized analysis using
half-Cauchy priors on the mean model variances, and a multivariate analysis
with a Bayesian functional principal component analysis (FPCA) to model the
latent covariance. We show that the multivariate analysis performs well on MMP
data with sparse responses and outperforms existing non-Bayesian methods. In a
re-analysis of data from a study of the system of care (SOC) framework for
children with mental and behavioral disorders, we are able to provide a more
complete picture of the relationships in the data. Our analysis provides
additional insights into the functioning on the SOC that a previous univariate
analysis missed
Principal components for multivariate functional data
This is the author's version of a work that was accepted for publication in Computational Statistics and Data Analysis. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in COMPUTATIONAL STATISTICS AND DATA ANALYSIS, Vol 55, Issue 9, (2011) http://dx.doi.org/10.1016/j.csda.2011.03.01
Supervised Functional PCA with Covariate Dependent Mean and Covariance Structure
Incorporating covariate information into functional data analysis methods can
substantially improve modeling and prediction performance. However, many
functional data analysis methods do not make use of covariate or supervision
information, and those that do often have high computational cost or assume
that only the scores are related to covariates, an assumption that is usually
violated in practice. In this article, we propose a functional data analysis
framework that relates both the mean and covariance function to covariate
information. To facilitate modeling and ensure the covariance function is
positive semi-definite, we represent it using splines and design a map from
Euclidean space to the symmetric positive semi-definite matrix manifold. Our
model is combined with a roughness penalty to encourage smoothness of the
estimated functions in both the temporal and covariate domains. We also develop
an efficient method for fast evaluation of the objective and gradient
functions. Cross-validation is used to choose the tuning parameters. We
demonstrate the advantages of our approach through a simulation study and an
astronomical data analysis.Comment: 24 pages, 15 figure
Fast Generalized Functional Principal Components Analysis
We propose a new fast generalized functional principal components analysis
(fast-GFPCA) algorithm for dimension reduction of non-Gaussian functional data.
The method consists of: (1) binning the data within the functional domain; (2)
fitting local random intercept generalized linear mixed models in every bin to
obtain the initial estimates of the person-specific functional linear
predictors; (3) using fast functional principal component analysis to smooth
the linear predictors and obtain their eigenfunctions; and (4) estimating the
global model conditional on the eigenfunctions of the linear predictors. An
extensive simulation study shows that fast-GFPCA performs as well or better
than existing state-of-the-art approaches, it is orders of magnitude faster
than existing general purpose GFPCA methods, and scales up well with both the
number of observed curves and observations per curve. Methods were motivated by
and applied to a study of active/inactive physical activity profiles obtained
from wearable accelerometers in the NHANES 2011-2014 study. The method can be
implemented by any user familiar with mixed model software, though the R
package fastGFPCA is provided for convenience
Bayesian Functional Principal Component Analysis using Relaxed Mutually Orthogonal Processes
Functional Principal Component Analysis (FPCA) is a prominent tool to
characterize variability and reduce dimension of longitudinal and functional
datasets. Bayesian implementations of FPCA are advantageous because of their
ability to propagate uncertainty in subsequent modeling. To ease computation,
many modeling approaches rely on the restrictive assumption that functional
principal components can be represented through a pre-specified basis. Under
this assumption, inference is sensitive to the basis, and misspecification can
lead to erroneous results. Alternatively, we develop a flexible Bayesian FPCA
model using Relaxed Mutually Orthogonal (ReMO) processes. We define ReMO
processes to enforce mutual orthogonality between principal components to
ensure identifiability of model parameters. The joint distribution of ReMO
processes is governed by a penalty parameter that determines the degree to
which the processes are mutually orthogonal and is related to ease of posterior
computation. In comparison to other methods, FPCA using ReMO processes provides
a more flexible, computationally convenient approach that facilitates accurate
propagation of uncertainty. We demonstrate our proposed model using extensive
simulation experiments and in an application to study the effects of
breastfeeding status, illness, and demographic factors on weight dynamics in
early childhood. Code is available on GitHub at
https://github.com/jamesmatuk/ReMO-FPC
Recommended from our members
Functional data analytics for wearable device and neuroscience data
This thesis uses methods from functional data analysis (FDA) to solve problems from three scientific areas of study. While the areas of application are quite distinct, the common thread of functional data analysis ties them together. The first chapter describes interactive open-source software for explaining and disseminating results of functional data analyses. Chapters two and three use curve alignment, or registration, to solve common problems in accelerometry and neuroimaging, respectively. The final chapter introduces a novel regression method for modeling functional outcomes that are trajectories over time. The first chapter of this thesis details a software package for interactively visualizing functional data analyses. The software is designed to work for a wide range of datasets and several types of analyses. This chapter describes that software and provides an overview ofFDA in different contexts. The second chapter introduces a framework for curve alignment, or registration, of exponential family functional data. The approach distinguishes itself from previous registration methods in its ability to handle dense binary observations with computational efficiency. Motivation comes from the Baltimore Longitudinal Study on Aging, in which accelerometer data provides valuable insights into the timing of sedentary behavior. The third chapter takes lessons learned about curve registration from the second chapter and use them to develop methods in an entirely new context: large multisite brain imaging studies. Scanner effects in multisite imaging studies are non-biological variability due to technical differences across sites and scanner hardware. This method identifies and removes scanner effects by registering cumulative distribution functions of image intensities values. In the final chapter the focus shifts from curve registration to regression. Described within this chapter is an entirely new nonlinear regression framework that draws from both functional data analysis and systems of ordinary equations. This model is motivated by the neurobiology of skilled movement, and was developed to capture the relationship between neural activity and arm movement in mice
Recommended from our members
Methods in functional data analysis and functional genomics
This thesis has two overall themes, both of which involve the word functional, albeit in different contexts. The theme that motivates two of the chapters is the development of methods that enable a deeper understanding of the variability of functional data. The theme of the final chapter is the development of methods that enable a deeper understanding of the landscape of functionality across the human genome in different human tissues.
The first chapter of this thesis provides a framework for quantifying the variability of functional data and for analyzing the factors that affect this variability. We extend functional principal components analysis by modeling the variance of principal component scores. We pose a Bayesian model, which we estimate using variational Bayes methods. We illustrate our model with an application to a kinematic dataset of two-dimensional planar reaching motions by healthy subjects, showing the effect of learning on motion variability.
The second chapter of this thesis provides an alternative method for decomposing functional data that follows a Poisson distribution. Classical methods pose a latent Gaussian process that is then linked to the observed data via a logarithmic link function. We pose an alternative model that draws on ideas from non-negative matrix factorization, in which we constrain both scores and spline coefficient vectors for the functional prototypes to be non-negative. We impose smoothness on the functional prototypes. We estimate our model using the method of alternating minimization. We illustrate our model with an application to a dataset of accelerometer readings from elderly healthy Americans.
The third chapter of this thesis focuses on functional genomics, rather than functional data analysis. Here we pose a method for unsupervised clustering of functional genomics data. Our method is non-parametric, allowing for flexible modeling of the functional genomics data without binarization. We estimate our model using variational Bayes methods, and illustrate it by calculating genome-wide functional scores (based on a partition of our clusters into functional and non-functional clusters) for 127 different human tissues. We show that these genome-wide and tissue-specific functional scores provide state-of-the-art functional prediction
From Points to Probability Measures: Statistical Learning on Distributions with Kernel Mean Embedding
The dissertation presents a novel learning framework on probability measures which has abundant real-world
applications. In classical setup, it is assumed that the data are points that have been drawn independent and identically (i.i.d.) from some unknown distribution. In many scenarios, however, representing data as distributions may be more preferable. For instance, when the measurement is noisy, we may tackle the uncertainty by treating the data themselves as distributions, which is often the case for microarray data and
astronomical data where the measurement process is imprecise and replication is often required. Distributions not only embody individual data points, but also constitute information about their interactions which can be beneficial for structural learning in high-energy physics, cosmology, causality, and so on. Moreover, classical problems in statistics such as statistical estimation, hypothesis testing, and causal inference, may be interpreted in a decision-theoretic sense as machine learning problems on empirical distributions. Rephrasing these problems as such leads to novel approach for statistical inference and estimation. Hence, allowing learning algorithms to operate directly on distributions prompts a wide range of future applications.
To work with distributions, the key methodology adopted in this thesis is the kernel mean embedding of distributions which represents each distribution as a mean function in a reproducing kernel Hilbert space (RKHS). In particular, the kernel mean embedding has been applied successfully in two-sample testing, graphical model, and probabilistic inference. On the other hand, this thesis will focus mainly on the predictive learning on distributions, i.e., when the observations are distributions and the goal is to make prediction about the previously unseen distributions. More importantly, the thesis investigates kernel mean estimation which is one of the most fundamental problems of kernel methods.
Probability distributions, as opposed to data points, constitute information at a higher level such as aggregate behavior of data points, how the underlying process evolves over time and domains, and a complex concept that cannot be described merely by individual points. Intelligent organisms have the ability to recognize and exploit such information naturally. Thus, this work may shed light on future development of intelligent machines, and most importantly, may provide clues on the true meaning of intelligence