12 research outputs found

    New Modeling Approaches Based on Varimax Rotation of Functional Principal Components

    Get PDF
    Functional Principal Component Analysis (FPCA) is an important dimension reduction technique to interpret themainmodes of functional data variation in terms of a small set of uncorrelated variables. The principal components can not always be simply interpreted and rotation is one of the main solutions to improve the interpretation. In this paper, two new functional Varimax rotation approaches are introduced. They are based on the equivalence between FPCA of basis expansion of the sample curves and Principal Component Analysis (PCA) of a transformation of thematrix of basis coefficients. The first approach consists of a rotation of the eigenvectors that preserves the orthogonality between the eigenfunctions but the rotated principal component scores are not uncorrelated. The second approach is based on rotation of the loadings of the standardized principal component scores that provides uncorrelated rotated scores but non-orthogonal eigenfunctions. A simulation study and an application with data from the curves of infections by COVID-19 pandemic in Spain are developed to study the performance of these methods by comparing the results with other existing approaches.Spanish Ministry of Science, Innovation and Universities (FEDER program) MTM2017-88708-PGovernment of Andalusia (Spain) FQM-307 FPU18/0177

    Bayesian Analysis of Multivariate Matched Proportions with Sparse Response

    Full text link
    Multivariate matched proportions (MMP) data appears in a variety of contexts including post-market surveillance of adverse events in pharmaceuticals, disease classification, and agreement between care providers. It consists of multiple sets of paired binary measurements taken on the same subject. While recent work proposes non-Bayesian methods to address the complexities of MMP data, the issue of sparse response, where no or very few "yes" responses are recorded for one or more sets, is unaddressed. The presence of sparse response sets results in underestimates of variance, loss of coverage, and lowered power in existing methods. Bayesian methods have not previously been considered for MMP data but provide a useful framework when sparse responses are present. In particular, the Bayesian probit model provides an elegant solution to the problem of variance underestimation. We examine three approaches built on that model: a naive analysis with flat priors, a penalized analysis using half-Cauchy priors on the mean model variances, and a multivariate analysis with a Bayesian functional principal component analysis (FPCA) to model the latent covariance. We show that the multivariate analysis performs well on MMP data with sparse responses and outperforms existing non-Bayesian methods. In a re-analysis of data from a study of the system of care (SOC) framework for children with mental and behavioral disorders, we are able to provide a more complete picture of the relationships in the data. Our analysis provides additional insights into the functioning on the SOC that a previous univariate analysis missed

    Principal components for multivariate functional data

    Get PDF
    This is the author's version of a work that was accepted for publication in Computational Statistics and Data Analysis. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in COMPUTATIONAL STATISTICS AND DATA ANALYSIS, Vol 55, Issue 9, (2011) http://dx.doi.org/10.1016/j.csda.2011.03.01

    Supervised Functional PCA with Covariate Dependent Mean and Covariance Structure

    Full text link
    Incorporating covariate information into functional data analysis methods can substantially improve modeling and prediction performance. However, many functional data analysis methods do not make use of covariate or supervision information, and those that do often have high computational cost or assume that only the scores are related to covariates, an assumption that is usually violated in practice. In this article, we propose a functional data analysis framework that relates both the mean and covariance function to covariate information. To facilitate modeling and ensure the covariance function is positive semi-definite, we represent it using splines and design a map from Euclidean space to the symmetric positive semi-definite matrix manifold. Our model is combined with a roughness penalty to encourage smoothness of the estimated functions in both the temporal and covariate domains. We also develop an efficient method for fast evaluation of the objective and gradient functions. Cross-validation is used to choose the tuning parameters. We demonstrate the advantages of our approach through a simulation study and an astronomical data analysis.Comment: 24 pages, 15 figure

    Fast Generalized Functional Principal Components Analysis

    Full text link
    We propose a new fast generalized functional principal components analysis (fast-GFPCA) algorithm for dimension reduction of non-Gaussian functional data. The method consists of: (1) binning the data within the functional domain; (2) fitting local random intercept generalized linear mixed models in every bin to obtain the initial estimates of the person-specific functional linear predictors; (3) using fast functional principal component analysis to smooth the linear predictors and obtain their eigenfunctions; and (4) estimating the global model conditional on the eigenfunctions of the linear predictors. An extensive simulation study shows that fast-GFPCA performs as well or better than existing state-of-the-art approaches, it is orders of magnitude faster than existing general purpose GFPCA methods, and scales up well with both the number of observed curves and observations per curve. Methods were motivated by and applied to a study of active/inactive physical activity profiles obtained from wearable accelerometers in the NHANES 2011-2014 study. The method can be implemented by any user familiar with mixed model software, though the R package fastGFPCA is provided for convenience

    Bayesian Functional Principal Component Analysis using Relaxed Mutually Orthogonal Processes

    Full text link
    Functional Principal Component Analysis (FPCA) is a prominent tool to characterize variability and reduce dimension of longitudinal and functional datasets. Bayesian implementations of FPCA are advantageous because of their ability to propagate uncertainty in subsequent modeling. To ease computation, many modeling approaches rely on the restrictive assumption that functional principal components can be represented through a pre-specified basis. Under this assumption, inference is sensitive to the basis, and misspecification can lead to erroneous results. Alternatively, we develop a flexible Bayesian FPCA model using Relaxed Mutually Orthogonal (ReMO) processes. We define ReMO processes to enforce mutual orthogonality between principal components to ensure identifiability of model parameters. The joint distribution of ReMO processes is governed by a penalty parameter that determines the degree to which the processes are mutually orthogonal and is related to ease of posterior computation. In comparison to other methods, FPCA using ReMO processes provides a more flexible, computationally convenient approach that facilitates accurate propagation of uncertainty. We demonstrate our proposed model using extensive simulation experiments and in an application to study the effects of breastfeeding status, illness, and demographic factors on weight dynamics in early childhood. Code is available on GitHub at https://github.com/jamesmatuk/ReMO-FPC

    From Points to Probability Measures: Statistical Learning on Distributions with Kernel Mean Embedding

    Get PDF
    The dissertation presents a novel learning framework on probability measures which has abundant real-world applications. In classical setup, it is assumed that the data are points that have been drawn independent and identically (i.i.d.) from some unknown distribution. In many scenarios, however, representing data as distributions may be more preferable. For instance, when the measurement is noisy, we may tackle the uncertainty by treating the data themselves as distributions, which is often the case for microarray data and astronomical data where the measurement process is imprecise and replication is often required. Distributions not only embody individual data points, but also constitute information about their interactions which can be beneficial for structural learning in high-energy physics, cosmology, causality, and so on. Moreover, classical problems in statistics such as statistical estimation, hypothesis testing, and causal inference, may be interpreted in a decision-theoretic sense as machine learning problems on empirical distributions. Rephrasing these problems as such leads to novel approach for statistical inference and estimation. Hence, allowing learning algorithms to operate directly on distributions prompts a wide range of future applications. To work with distributions, the key methodology adopted in this thesis is the kernel mean embedding of distributions which represents each distribution as a mean function in a reproducing kernel Hilbert space (RKHS). In particular, the kernel mean embedding has been applied successfully in two-sample testing, graphical model, and probabilistic inference. On the other hand, this thesis will focus mainly on the predictive learning on distributions, i.e., when the observations are distributions and the goal is to make prediction about the previously unseen distributions. More importantly, the thesis investigates kernel mean estimation which is one of the most fundamental problems of kernel methods. Probability distributions, as opposed to data points, constitute information at a higher level such as aggregate behavior of data points, how the underlying process evolves over time and domains, and a complex concept that cannot be described merely by individual points. Intelligent organisms have the ability to recognize and exploit such information naturally. Thus, this work may shed light on future development of intelligent machines, and most importantly, may provide clues on the true meaning of intelligence
    corecore