107 research outputs found
Common and Distinct Components in Data Fusion
In many areas of science multiple sets of data are collected pertaining to
the same system. Examples are food products which are characterized by
different sets of variables, bio-processes which are on-line sampled with
different instruments, or biological systems of which different genomics
measurements are obtained. Data fusion is concerned with analyzing such sets of
data simultaneously to arrive at a global view of the system under study. One
of the upcoming areas of data fusion is exploring whether the data sets have
something in common or not. This gives insight into common and distinct
variation in each data set, thereby facilitating understanding the
relationships between the data sets. Unfortunately, research on methods to
distinguish common and distinct components is fragmented, both in terminology
as well as in methods: there is no common ground which hampers comparing
methods and understanding their relative merits. This paper provides a unifying
framework for this subfield of data fusion by using rigorous arguments from
linear algebra. The most frequently used methods for distinguishing common and
distinct components are explained in this framework and some practical examples
are given of these methods in the areas of (medical) biology and food science.Comment: 50 pages, 12 figure
Some Topics Concerning the Singular Value Decomposition and Generalized Singular Value Decomposition
abstract: This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of the properties of the chi-squared method for regularization parameter selection in the case of nonnormal data and (iii) formulation of a partial canonical correlation concept for continuous time stochastic processes. The finite dimensional SVD has an infinite dimensional generalization to compact operators. However, the form of the finite dimensional GSVD developed in, e.g., Van Loan does not extend directly to infinite dimensions as a result of a key step in the proof that is specific to the matrix case. Thus, the first problem of interest is to find an infinite dimensional version of the GSVD. One such GSVE for compact operators on separable Hilbert spaces is developed. The second problem concerns regularization parameter estimation. The chi-squared method for nonnormal data is considered. A form of the optimized regularization criterion that pertains to measured data or signals with nonnormal noise is derived. Large sample theory for phi-mixing processes is used to derive a central limit theorem for the chi-squared criterion that holds under certain conditions. Departures from normality are seen to manifest in the need for a possibly different scale factor in normalization rather than what would be used under the assumption of normality. The consequences of our large sample work are illustrated by empirical experiments. For the third problem, a new approach is examined for studying the relationships between a collection of functional random variables. The idea is based on the work of Sunder that provides mappings to connect the elements of algebraic and orthogonal direct sums of subspaces in a Hilbert space. When combined with a key isometry associated with a particular Hilbert space indexed stochastic process, this leads to a useful formulation for situations that involve the study of several second order processes. In particular, using our approach with two processes provides an independent derivation of the functional canonical correlation analysis (CCA) results of Eubank and Hsing. For more than two processes, a rigorous derivation of the functional partial canonical correlation analysis (PCCA) concept that applies to both finite and infinite dimensional settings is obtained.Dissertation/ThesisPh.D. Statistics 201
Optimal selection of reduced rank estimators of high-dimensional matrices
We introduce a new criterion, the Rank Selection Criterion (RSC), for
selecting the optimal reduced rank estimator of the coefficient matrix in
multivariate response regression models. The corresponding RSC estimator
minimizes the Frobenius norm of the fit plus a regularization term proportional
to the number of parameters in the reduced rank model. The rank of the RSC
estimator provides a consistent estimator of the rank of the coefficient
matrix; in general, the rank of our estimator is a consistent estimate of the
effective rank, which we define to be the number of singular values of the
target matrix that are appropriately large. The consistency results are valid
not only in the classic asymptotic regime, when , the number of responses,
and , the number of predictors, stay bounded, and , the number of
observations, grows, but also when either, or both, and grow, possibly
much faster than . We establish minimax optimal bounds on the mean squared
errors of our estimators. Our finite sample performance bounds for the RSC
estimator show that it achieves the optimal balance between the approximation
error and the penalty term. Furthermore, our procedure has very low
computational complexity, linear in the number of candidate models, making it
particularly appealing for large scale problems. We contrast our estimator with
the nuclear norm penalized least squares (NNP) estimator, which has an
inherently higher computational complexity than RSC, for multivariate
regression models. We show that NNP has estimation properties similar to those
of RSC, albeit under stronger conditions. However, it is not as parsimonious as
RSC. We offer a simple correction of the NNP estimator which leads to
consistent rank estimation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS876 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org) (some typos corrected
Generalized Singular Value Decomposition with Additive Components
The singular value decomposition (SVD) technique is extended to incorporate the additive components for approximation of a rectangular matrix by the outer products of vectors. While dual vectors of the regular SVD can be expressed one via linear transformation of the other, the modified SVD corresponds to the general linear transformation with the additive part. The method obtained can be related to the family of principal component and correspondence analyses, and can be reduced to an eigenproblem of a specific transformation of a data matrix. This technique is applied to constructing dual eigenvectors for data visualizing in a two dimensional space
A generalization of partial least squares regression and correspondence analysis for categorical and mixed data: An application with the ADNI data
The present and future of large scale studies of human brain and behaviorin typical and disease populationsis mutli-omics, deep-phenotyping, or other types of multi-source and multi-domain data collection initiatives. These massive studies rely on highly interdisciplinary teams that collect extremely diverse types of data across numerous systems and scales of measurement (e.g., genetics, brain structure, behavior, and demographics). Such large, complex, and heterogeneous data requires relatively simple methods that allow for exibility in analyses without the loss of the inherent properties of various data types. Here we introduce a method designed * Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimag-ing Initiative (ADNI) database (http://adni.loni.usc.edu/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found a
- …