Search CORE

107 research outputs found

Common and Distinct Components in Data Fusion

Author: Acar
Acar
Alter
Bevilacqua
Bookstein
Bro
Bro
Bylesjo
Consonni
Correa
Dahl
De Moor
De Moor
de Noord
Golub
Hanafi
Hotelling
Jansen
Kettenring
Kirwan
Lahat
Lips
Lock
Lofstedt
Lofstedt
Lynch
Mage
Mattarucchi
Naes
Pages
Paige
Peres-Neto
Petrakis
Ponnapalli
Ray
Schonemann
Schott
Schouteden
Shan
Sidiropoulos
Smilde
Smilde
Smilde
Srivastava
Szymanski
Tao
Tauler
Tenenhaus
Ten Berge
Tibshirani
Timmerman
Timmerman
Tomassini
Trygg
Trygg
Van de Geer
Van den Berg
Van den Berg
van der Burg
Van der Kloet
Van Deun
Van Deun
Van Deun
Van Loan
Van Mechelen
Westerhuis
Wilderjans
Yanai
Publication venue
Publication date: 08/07/2016
Field of study

In many areas of science multiple sets of data are collected pertaining to the same system. Examples are food products which are characterized by different sets of variables, bio-processes which are on-line sampled with different instruments, or biological systems of which different genomics measurements are obtained. Data fusion is concerned with analyzing such sets of data simultaneously to arrive at a global view of the system under study. One of the upcoming areas of data fusion is exploring whether the data sets have something in common or not. This gives insight into common and distinct variation in each data set, thereby facilitating understanding the relationships between the data sets. Unfortunately, research on methods to distinguish common and distinct components is fragmented, both in terminology as well as in methods: there is no common ground which hampers comparing methods and understanding their relative merits. This paper provides a unifying framework for this subfield of data fusion by using rigorous arguments from linear algebra. The most frequently used methods for distinguishing common and distinct components are explained in this framework and some practical examples are given of these methods in the areas of (medical) biology and food science.Comment: 50 pages, 12 figure

arXiv.org e-Print Archive

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

NOFIMA Repository

Copenhagen University Research Information System

Leiden University Scholary Publications

Dissertations of the University of Groningen

Some Topics Concerning the Singular Value Decomposition and Generalized Singular Value Decomposition

Author
Publication venue
Publication date: 01/01/2012
Field of study

abstract: This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of the properties of the chi-squared method for regularization parameter selection in the case of nonnormal data and (iii) formulation of a partial canonical correlation concept for continuous time stochastic processes. The finite dimensional SVD has an infinite dimensional generalization to compact operators. However, the form of the finite dimensional GSVD developed in, e.g., Van Loan does not extend directly to infinite dimensions as a result of a key step in the proof that is specific to the matrix case. Thus, the first problem of interest is to find an infinite dimensional version of the GSVD. One such GSVE for compact operators on separable Hilbert spaces is developed. The second problem concerns regularization parameter estimation. The chi-squared method for nonnormal data is considered. A form of the optimized regularization criterion that pertains to measured data or signals with nonnormal noise is derived. Large sample theory for phi-mixing processes is used to derive a central limit theorem for the chi-squared criterion that holds under certain conditions. Departures from normality are seen to manifest in the need for a possibly different scale factor in normalization rather than what would be used under the assumption of normality. The consequences of our large sample work are illustrated by empirical experiments. For the third problem, a new approach is examined for studying the relationships between a collection of functional random variables. The idea is based on the work of Sunder that provides mappings to connect the elements of algebraic and orthogonal direct sums of subspaces in a Hilbert space. When combined with a key isometry associated with a particular Hilbert space indexed stochastic process, this leads to a useful formulation for situations that involve the study of several second order processes. In particular, using our approach with two processes provides an independent derivation of the functional canonical correlation analysis (CCA) results of Eubank and Hsing. For more than two processes, a rigorous derivation of the functional partial canonical correlation analysis (PCCA) concept that applies to both finite and infinite dimensional settings is obtained.Dissertation/ThesisPh.D. Statistics 201

ASU Digital Repository

Optimal selection of reduced rank estimators of high-dimensional matrices

Author: Bunea Florentina
She Yiyuan
Wegkamp Marten H.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 16/10/2011
Field of study

We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix; in general, the rank of our estimator is a consistent estimate of the effective rank, which we define to be the number of singular values of the target matrix that are appropriately large. The consistency results are valid not only in the classic asymptotic regime, when

n

, the number of responses, and

p

, the number of predictors, stay bounded, and

m

, the number of observations, grows, but also when either, or both,

n

and

p

grow, possibly much faster than

m

. We establish minimax optimal bounds on the mean squared errors of our estimators. Our finite sample performance bounds for the RSC estimator show that it achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly appealing for large scale problems. We contrast our estimator with the nuclear norm penalized least squares (NNP) estimator, which has an inherently higher computational complexity than RSC, for multivariate regression models. We show that NNP has estimation properties similar to those of RSC, albeit under stronger conditions. However, it is not as parsimonious as RSC. We offer a simple correction of the NNP estimator which leads to consistent rank estimation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS876 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org) (some typos corrected

arXiv.org e-Print Archive

Crossref

Generalized Singular Value Decomposition with Additive Components

Author: Lipovetsky Stan
Publication venue: DigitalCommons@WayneState
Publication date: 01/05/2016
Field of study

The singular value decomposition (SVD) technique is extended to incorporate the additive components for approximation of a rectangular matrix by the outer products of vectors. While dual vectors of the regular SVD can be expressed one via linear transformation of the other, the modified SVD corresponds to the general linear transformation with the additive part. The method obtained can be related to the family of principal component and correspondence analyses, and can be reduced to an eigenproblem of a specific transformation of a data matrix. This technique is applied to constructing dual eigenvectors for data visualizing in a two dimensional space

Digital Commons@Wayne State University

Common and distinct components in data fusion

Author: Acar E.
Bro R.
Hankemeier T.
Kiers H.A.L.
Lips M.A.
Mage I.
Naes T.
Smilde A.K.
Publication venue
Publication date: 01/01/2017
Field of study

Leiden University Scholary Publications

A generalization of partial least squares regression and correspondence analysis for categorical and mixed data: An application with the ADNI data

Author: Beaton Derek
Behavioral Herve,
Saporta Gilbert
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 07/02/2020
Field of study

The present and future of large scale studies of human brain and behaviorin typical and disease populationsis mutli-omics, deep-phenotyping, or other types of multi-source and multi-domain data collection initiatives. These massive studies rely on highly interdisciplinary teams that collect extremely diverse types of data across numerous systems and scales of measurement (e.g., genetics, brain structure, behavior, and demographics). Such large, complex, and heterogeneous data requires relatively simple methods that allow for exibility in analyses without the loss of the inherent properties of various data types. Here we introduce a method designed * Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimag-ing Initiative (ADNI) database (http://adni.loni.usc.edu/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found a

Generalized Singular Value Decomposition with Additive Components

Author
Publication venue: 'Wayne State University Library System'
Publication date
Field of study

Crossref