29 research outputs found

    Clustering multivariate functional data with phase variation

    Get PDF
    When functional data come as multiple curves per subject, characterizing the source of variations is not a trivial problem. The complexity of the problem goes deeper when there is phase variation in addition to amplitude variation. We consider clustering problem with multivariate functional data that have phase variations among the functional variables. We propose a conditional subject-specific warping framework in order to extract relevant features for clustering. Using multivariate growth curves of various parts of the body as a motivating example, we demonstrate the effectiveness of the proposed approach. The found clusters have individuals who show different relative growth patterns among different parts of the body

    High dimension, low sample size data analysis

    Get PDF
    This dissertation consists of three research topics regarding High Dimension, Low Sample Size (HDLSS) data analysis. The first topic is a study of the sample covariance matrix of a data set with extremely large dimensionality, but with relatively small sample size. Especially the asymptotic behavior of eigenvalues and eigenvectors of the sample covariance matrix is the focus of our study. Assuming that the true population covariance matrix of the data is not too far from identity matrix (i.e., spherical in the Gaussian case), we show that the sample eigenvalues and eigenvectors tend to behave as if the true structure of the data is indeed from identity covariance. Based on this, the asymptotic geometric representation of HDLSS data is extended to a wide range of underlying distributions. The representation essentially states that data vectors form a regular simplex in the data space with the number of vertices equal to the sample size. The second part of the dissertation studies a discriminant direction vector, which is only interesting in HDLSS settings. This direction is characterized by the property that it projects all the data vectors, which are generated from two classes, to two distinct values, one for each class. It will be seen that this Maximal Data Piling (MDP) direction lies within the hyperplane generated by all the data vectors, while it is orthogonal to the hyperplanes generated by each class. It has the largest distance between piling sites among all the possible piling direction vectors and also maximizes the amount of piling. The formula of MDP is equivalent to the Fisher's linear discrimination when the dimension is less than the sample size. As a classification method, MDP is heuristically desirable when the data are well approximated by the HDLSS geometric representation. The third topic relates to kernel methods in statistical learning, especially the kernel based classification problem. Taking the case of the Gaussian kernel function for the support vector machines and mean difference methods, we propose a novel approach to select the bandwidth parameter in kernel functions. The derivation is based on the fact that the bandwidth parameter in a kernel function determines the geometry of the high dimensional kernel embedded feature space. Compared with cross-validation and other tuning criteria from the literature, our approach is demonstrated to be robust to the sampling variation, while maintaining comparable classification power and low computing cost, in real and simulated data examples

    Support vector machines with adaptive penalty

    Get PDF
    The standard Support Vector Machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions (Bradley and Mangasarian, 1998; Zhu et al., 2003). These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the L2 SVM generally works well except when there are too many noise inputs, while the L1 SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the Lq SVM, where the best q > 0 is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed Lq penalty is more robust to noise variables than the L1 and L2 penalties. An iterative algorithm is suggested to solve the Lq SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure

    Support vector machines with adaptive Lq penalty

    Get PDF
    The standard support vector machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions [Bradley, P., Mangasarian, O., 1998

    Sparse Functional Linear Discriminant Analysis

    Get PDF
    Functional linear discriminant analysis offers a simple yet efficient method for classification, with the possibility of achieving a perfect classification. Several methods are proposed in the literature that mostly address the dimensionality of the problem. On the other hand, there is a growing interest in interpretability of the analysis, which favors a simple and sparse solution. In this work, we propose a new approach that incorporates a type of sparsity that identifies nonzero sub-domains in the functional setting, offering a solution that is easier to interpret without compromising performance. With the need to embed additional constraints in the solution, we reformulate the functional linear discriminant analysis as a regularization problem with an appropriate penalty. Inspired by the success of ℓ1-type regularization at inducing zero coefficients for scalar variables, we develop a new regularization method for functional linear discriminant analysis that incorporates an L1-type penalty, ∫ |f|, to induce zero regions. We demonstrate that our formulation has a well-defined solution that contains zero regions, achieving a functional sparsity in the sense of domain selection. In addition, the misclassification probability of the regularized solution is shown to converge to the Bayes error if the data are Gaussian. Our method does not presume that the underlying function has zero regions in the domain, but produces a sparse estimator that consistently estimates the true function whether or not the latter is sparse. Numerical comparisons with existing methods demonstrate this property in finite samples with both simulated and real data examples

    Perinatal Docosahexaenoic Acid Supplementation Improves Cognition and Alters Brain Functional Organization in Piglets.

    Get PDF
    Epidemiologic studies associate maternal docosahexaenoic acid (DHA)/DHA-containing seafood intake with enhanced cognitive development; although, it should be noted that interventional trials show inconsistent findings. We examined perinatal DHA supplementation on cognitive performance, brain anatomical and functional organization, and the brain monoamine neurotransmitter status of offspring using a piglet model. Sows were fed a control (CON) or a diet containing DHA (DHA) from late gestation throughout lactation. Piglets underwent an open field test (OFT), an object recognition test (ORT), and magnetic resonance imaging (MRI) to acquire anatomical, diffusion tensor imaging (DTI), and resting-state functional MRI (rs-fMRI) at weaning. Piglets from DHA-fed sows spent 95% more time sniffing the walls than CON in OFT and exhibited an elevated interest in the novel object in ORT, while CON piglets demonstrated no preference. Maternal DHA supplementation increased fiber length and tended to increase fractional anisotropy in the hippocampus of offspring than CON. DHA piglets exhibited increased functional connectivity in the cerebellar, visual, and default mode network and decreased activity in executive control and sensorimotor network compared to CON. The brain monoamine neurotransmitter levels did not differ in healthy offspring. Perinatal DHA supplementation may increase exploratory behaviors, improve recognition memory, enhance fiber tract integrity, and alter brain functional organization in offspring at weaning

    Reproducing Kernels and New Approaches in Compositional Data Analysis

    Full text link
    Compositional data, such as human gut microbiomes, consist of non-negative variables whose only the relative values to other variables are available. Analyzing compositional data such as human gut microbiomes needs a careful treatment of the geometry of the data. A common geometrical understanding of compositional data is via a regular simplex. Majority of existing approaches rely on a log-ratio or power transformations to overcome the innate simplicial geometry. In this work, based on the key observation that a compositional data are projective in nature, and on the intrinsic connection between projective and spherical geometry, we re-interpret the compositional domain as the quotient topology of a sphere modded out by a group action. This re-interpretation allows us to understand the function space on compositional domains in terms of that on spheres and to use spherical harmonics theory along with reflection group actions for constructing a compositional Reproducing Kernel Hilbert Space (RKHS). This construction of RKHS for compositional data will widely open research avenues for future methodology developments. In particular, well-developed kernel embedding methods can be now introduced to compositional data analysis. The polynomial nature of compositional RKHS has both theoretical and computational benefits. The wide applicability of the proposed theoretical framework is exemplified with nonparametric density estimation and kernel exponential family for compositional data
    corecore