2,110 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Doctor of Philosophy
dissertationLatent structures play a vital role in many data analysis tasks. By providing compact yet expressive representations, such structures can offer useful insights into the complex and high-dimensional datasets encountered in domains such as computational biology, computer vision, natural language processing, etc. Specifying the right complexity of these latent structures for a given problem is an important modeling decision. Instead of using models with an a priori fixed complexity, it is desirable to have models that can adapt their complexity as the data warrant. Nonparametric Bayesian models are motivated precisely based on this desideratum by offering a flexible modeling paradigm for data without limiting the model-complexity a priori. The flexibility comes from the model's ability to adjust its complexity adaptively with data. This dissertation is about nonparametric Bayesian learning of two specific types of latent structures: (1) low-dimensional latent features underlying high-dimensional observed data where the latent features could exhibit interdependencies, and (2) latent task structures that capture how a set of learning tasks relate with each other, a notion critical in the paradigm of Multitask Learning where the goal is to solve multiple learning tasks jointly in order to borrow information across similar tasks. Another focus of this dissertation is on designing efficient approximate inference algorithms for nonparametric Bayesian models. Specifically, for the nonparametric Bayesian latent feature model where the goal is to infer the binary-valued latent feature assignment matrix for a given set of observations, the dissertation proposes two approximate inference methods. The first one is a search-based algorithm to find the maximum-a-posteriori (MAP) solution for the latent feature assignment matrix. The second one is a sequential Monte-Carlo-based approximate inference algorithm that allows processing the data oneexample- at-a-time while being space-efficient in terms of the storage required to represent the posterior distribution of the latent feature assignment matrix
Kernel methods for detecting coherent structures in dynamical data
We illustrate relationships between classical kernel-based dimensionality
reduction techniques and eigendecompositions of empirical estimates of
reproducing kernel Hilbert space (RKHS) operators associated with dynamical
systems. In particular, we show that kernel canonical correlation analysis
(CCA) can be interpreted in terms of kernel transfer operators and that it can
be obtained by optimizing the variational approach for Markov processes (VAMP)
score. As a result, we show that coherent sets of particle trajectories can be
computed by kernel CCA. We demonstrate the efficiency of this approach with
several examples, namely the well-known Bickley jet, ocean drifter data, and a
molecular dynamics problem with a time-dependent potential. Finally, we propose
a straightforward generalization of dynamic mode decomposition (DMD) called
coherent mode decomposition (CMD). Our results provide a generic machine
learning approach to the computation of coherent sets with an objective score
that can be used for cross-validation and the comparison of different methods
Bayesian methodologies for constrained spaces.
Due to advances in technology, there is a presence of directional data in a wide variety of fields. Often distributions to model directional data are defined on manifold or constrained spaces. Regular statistical methods applied to data defined on special geometries can give misleading results, and this demands new statistical theory. This dissertation addresses two such problems and develops Bayesian methodologies to improve inference in these arenas. It consists of two projects: 1. A Bayesian Methodology for Estimation for Sparse Canonical Correlation, and 2. Bayesian Analysis of Finite Mixture Model for Spherical Data. In principle, it can be challenging to integrate data measured on the same individuals occurring from different experiments and model it together to gain a larger understanding of the problem. Canonical Correlation Analysis (CCA) provides a useful tool for establishing relationships between such data sets. When dealing with high dimensional data sets, Structured Sparse CCA (ScSCCA) is a rapidly developing methodological area which seeks to represent the interrelations using sparse direction vectors for CCA. There is less development in Bayesian methodology in this area. We propose a novel Bayesian ScSCCA method with the use of a Bayesian infinite factor model. Using a multiplicative half Cauchy prior process, we bring in sparsity at the level of the projection matrix. Additionally, we promote further sparsity in the covariance matrix by using graphical horseshoe prior or diagonal structure. We compare the results for our proposed model with competing frequentist and Bayesian methods and apply the developed method to omics data arising from a breast cancer study. In the second project, we perform Bayesian Analysis for the von Mises Fisher (vMF) distribution on the sphere which is a common and important distribution used for directional data. In the first part of this project, we propose a new conjugate prior for the mean vector and concentration parameter of the vMF distribution. Further we prove its properties like finiteness, unimodality, and provide interpretations of its hyperparameters. In the second part, we utilize a popular prior structure for a mixture of vMF distributions. In this case, the posterior of the concentration parameter consists of an intractable Bessel function of the first kind. We propose a novel Data Augmentation Strategy (DAS) using a Negative Binomial Distribution that removes this intractable Bessel function. Furthermore, we apply the developed methodology to Diffusion Tensor Imaging (DTI) data for clustering to explore voxel connectivity in human brain
- …