107 research outputs found

    Common and Distinct Components in Data Fusion

    Get PDF
    In many areas of science multiple sets of data are collected pertaining to the same system. Examples are food products which are characterized by different sets of variables, bio-processes which are on-line sampled with different instruments, or biological systems of which different genomics measurements are obtained. Data fusion is concerned with analyzing such sets of data simultaneously to arrive at a global view of the system under study. One of the upcoming areas of data fusion is exploring whether the data sets have something in common or not. This gives insight into common and distinct variation in each data set, thereby facilitating understanding the relationships between the data sets. Unfortunately, research on methods to distinguish common and distinct components is fragmented, both in terminology as well as in methods: there is no common ground which hampers comparing methods and understanding their relative merits. This paper provides a unifying framework for this subfield of data fusion by using rigorous arguments from linear algebra. The most frequently used methods for distinguishing common and distinct components are explained in this framework and some practical examples are given of these methods in the areas of (medical) biology and food science.Comment: 50 pages, 12 figure

    Some Topics Concerning the Singular Value Decomposition and Generalized Singular Value Decomposition

    Get PDF
    abstract: This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of the properties of the chi-squared method for regularization parameter selection in the case of nonnormal data and (iii) formulation of a partial canonical correlation concept for continuous time stochastic processes. The finite dimensional SVD has an infinite dimensional generalization to compact operators. However, the form of the finite dimensional GSVD developed in, e.g., Van Loan does not extend directly to infinite dimensions as a result of a key step in the proof that is specific to the matrix case. Thus, the first problem of interest is to find an infinite dimensional version of the GSVD. One such GSVE for compact operators on separable Hilbert spaces is developed. The second problem concerns regularization parameter estimation. The chi-squared method for nonnormal data is considered. A form of the optimized regularization criterion that pertains to measured data or signals with nonnormal noise is derived. Large sample theory for phi-mixing processes is used to derive a central limit theorem for the chi-squared criterion that holds under certain conditions. Departures from normality are seen to manifest in the need for a possibly different scale factor in normalization rather than what would be used under the assumption of normality. The consequences of our large sample work are illustrated by empirical experiments. For the third problem, a new approach is examined for studying the relationships between a collection of functional random variables. The idea is based on the work of Sunder that provides mappings to connect the elements of algebraic and orthogonal direct sums of subspaces in a Hilbert space. When combined with a key isometry associated with a particular Hilbert space indexed stochastic process, this leads to a useful formulation for situations that involve the study of several second order processes. In particular, using our approach with two processes provides an independent derivation of the functional canonical correlation analysis (CCA) results of Eubank and Hsing. For more than two processes, a rigorous derivation of the functional partial canonical correlation analysis (PCCA) concept that applies to both finite and infinite dimensional settings is obtained.Dissertation/ThesisPh.D. Statistics 201

    Optimal selection of reduced rank estimators of high-dimensional matrices

    Full text link
    We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix; in general, the rank of our estimator is a consistent estimate of the effective rank, which we define to be the number of singular values of the target matrix that are appropriately large. The consistency results are valid not only in the classic asymptotic regime, when nn, the number of responses, and pp, the number of predictors, stay bounded, and mm, the number of observations, grows, but also when either, or both, nn and pp grow, possibly much faster than mm. We establish minimax optimal bounds on the mean squared errors of our estimators. Our finite sample performance bounds for the RSC estimator show that it achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly appealing for large scale problems. We contrast our estimator with the nuclear norm penalized least squares (NNP) estimator, which has an inherently higher computational complexity than RSC, for multivariate regression models. We show that NNP has estimation properties similar to those of RSC, albeit under stronger conditions. However, it is not as parsimonious as RSC. We offer a simple correction of the NNP estimator which leads to consistent rank estimation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS876 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org) (some typos corrected

    Generalized Singular Value Decomposition with Additive Components

    Get PDF
    The singular value decomposition (SVD) technique is extended to incorporate the additive components for approximation of a rectangular matrix by the outer products of vectors. While dual vectors of the regular SVD can be expressed one via linear transformation of the other, the modified SVD corresponds to the general linear transformation with the additive part. The method obtained can be related to the family of principal component and correspondence analyses, and can be reduced to an eigenproblem of a specific transformation of a data matrix. This technique is applied to constructing dual eigenvectors for data visualizing in a two dimensional space

    A generalization of partial least squares regression and correspondence analysis for categorical and mixed data: An application with the ADNI data

    Get PDF
    The present and future of large scale studies of human brain and behaviorin typical and disease populationsis mutli-omics, deep-phenotyping, or other types of multi-source and multi-domain data collection initiatives. These massive studies rely on highly interdisciplinary teams that collect extremely diverse types of data across numerous systems and scales of measurement (e.g., genetics, brain structure, behavior, and demographics). Such large, complex, and heterogeneous data requires relatively simple methods that allow for exibility in analyses without the loss of the inherent properties of various data types. Here we introduce a method designed * Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimag-ing Initiative (ADNI) database (http://adni.loni.usc.edu/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found a

    Generalized Singular Value Decomposition with Additive Components

    Full text link
    • …
    corecore