Search CORE

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

Dissertations of the University of Groningen

DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes

Author: A Subramanian
A Tanay
Age K. Smilde
AK Smilde
Anna Tramontano
Bart De Moor
C Hennig
CC Paige
CF Van Loan
HAL Kiers
HAL Kiers
HAL Kiers
Henk A. L. Kiers
I Måge
IT Jolliffe
Iven Van Mechelen
J Ihmels
J Westerhuis
JA Hageman
JM Stuart
K Devarajan
K Lemmens
K Van Deun
KA Bernstein
Katrijn Van Deun
Lieven De Lathauwer
Lieven Thorrez
M Schouteden
Mariët J. van der Werf
Martijn Schouteden
ME Timmerman
MJ van der Werf
MW Browne
NS Holter
O Alter
P Howland
P Tamayo
RA van den Berg
S Bergmann
S Friedland
SP Ponnapalli
T Dahl
T Löfstedt
U Lorenzo-Seva
VK Mootha
Z Bai
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

BACKGROUND: In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). RESULTS: Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. CONCLUSIONS: Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided

University of Groningen

International Migration, Integration and Social Cohesion online publications

UvA-DARE

The Francis Crick Institute

Public Library of Science (PLOS)

Proceedings - University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Orthogonal rotation in PCAMIX

Author: B Escofier
C Eckart
H Neudecker
HAL Kiers
HF Kaiser
J Leeuw de
J Pagès
JMF ten Berge
Jérôme Saracco
K Adachi
K Nevels
L-S Urbano
L-S Urbano
M Velden van de
M Velden van de
Marie Chavent
MW Browne
RI Jennrich
Vanessa Kuentz-Simonet
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2011
Field of study

Kiers (1991) considered the orthogonal rotation in PCAMIX, a principal component method for a mixture of qualitative and quantitative variables. PCAMIX includes the ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. In this paper, we give a new presentation of PCAMIX where the principal components and the squared loadings are obtained from a Singular Value Decomposition. The loadings of the quantitative variables and the principal coordinates of the categories of the qualitative variables are also obtained directly. In this context, we propose a computationaly efficient procedure for varimax rotation in PCAMIX and a direct solution for the optimal angle of rotation. A simulation study shows the good computational behavior of the proposed algorithm. An application on a real data set illustrates the interest of using rotation in MCA. All source codes are available in the R package "PCAmixdata"

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Oskar Bordeaux

A flexible framework for sparse simultaneous component based data integration

Author: AE Hoerl
AL Barabasi
Anestis Antoniadis
D Lee
DM Witten
GJ McLachlan
H Kiers
H Zou
H Zou
HAL Kiers
I Borg
I Jolliffe
IT Jolliffe
Iven Van Mechelen
J de Leeuw
J Friedman
J Huang
JMF Ten Berge
K Lange
K Lemmens
K Van Deun
K Van Deun
K Van Deun
KA Le Cao
Katrijn Van Deun
KR Gabriel
L Meier
M de Tayrac
M Kowalski
M Yuan
MJ van der Werf
N Ishii
O Alter
P Zhao
PJF Groenen
R Jenatton
R Tibshirani
R van den Berg
Robert A van den Berg
S Hochreiter
S Ma
T Wilderjans
TF Wilderjans
Tom F Wilderjans
WJ Heiser
Y Kim
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract 1 Background High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. 2 Results We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of <it>Escherichia coli </it>samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. 3 Conclusion Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach). 4 Availability The additional file contains a MATLAB implementation of the sparse simultaneous component method.</p

Lirias

Hal - Université Grenoble Alpes

Springer - Publisher Connector

Central Archive at the University of Reading

Recommended from our members

Generic, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks

Author: A Strehl
Atta Badii
B Ermi
BW Bader
DD Lee
DD Lee
HAL Kiers
Harshman
Hongbin Huang
J Yang
Jibing Wu
LD Lathauwer
LR Tucker
M Mørup
M Zhang
Q Zhao
Q Zhao
Qinggang Meng
S Brin
SJ Wang
Su Deng
TG Kolda
TG Kolda
X Cao
Y Sun
Y Sun
Y Sun
Y Sun
Yahui Wu
Z Gao
Z Gao
Z Gao
Z Zhang
Zhong-Ke Gao
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic

Public Library of Science (PLOS)

Loughborough University Institutional Repository

The Francis Crick Institute

Individual differences in metabolomics: individualised responses and between-metabolite relationships

Author: Age K. Smilde
AK Smilde
AK Smilde
AK Smilde
CD Broeckling
DA Herms
Ewa Szymańska
FJ Picard
G Zwanenburg
H Leur van
HAL Kiers
Huub C. J. Hoefsloot
J Trygg
JC Lindon
JD Carroll
Jeroen J. Jansen
JJ Jansen
JJ Jansen
JJ Jansen
JJ Jansen
JL Ward
JMF Ten Berge
JW Fahey
M Dyrby
ME Timmerman
N Dam van
R Millsap
R Steuer
RA Fisher
RA Harshman
RJ Hopkins
RP Bodnaryk
RR Sokal
S Smit
SJ Steppan
SRX Dall
W Weckwerth
W Wu
WB Dunn
Publication venue: Springer US
Publication date: 01/01/2012
Field of study

Many metabolomics studies aim to find ‘biomarkers’: sets of molecules that are consistently elevated or decreased upon experimental manipulation. Biological effects, however, often manifest themselves along a continuum of individual differences between the biological replicates in the experiment. Such differences are overlooked or even diminished by methods in standard use for metabolomics, although they may contain a wealth of information on the experiment. Properly understanding individual differences is crucial for generating knowledge in fields like personalised medicine, evolution and ecology. We propose to use simultaneous component analysis with individual differences constraints (SCA-IND), a data analysis method from psychology that focuses on these differences. This method constructs axes along the natural biochemical differences between biological replicates, comparable to principal components. The model may shed light on changes in the individual differences between experimental groups, but also on whether these differences correspond to, e.g., responders and non-responders or to distinct chemotypes. Moreover, SCA-IND reveals the individuals that respond most to a manipulation and are best suited for further experimentation. The method is illustrated by the analysis of individual differences in the metabolic response of cabbage plants to herbivory. The model reveals individual differences in the response to shoot herbivory, where two ‘response chemotypes’ may be identified. In the response to root herbivory the model shows that individual plants differ strongly in response dynamics. Thereby SCA-IND provides a hitherto unavailable view on the chemical diversity of the induced plant response, that greatly increases understanding of the system

Springer - Publisher Connector

Radboud Repository (Radboud Univ.)

International Migration, Integration and Social Cohesion online publications

A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data

Author: A Devos
A Gibb
A Hyvärinen
A Hyvärinen
A Pérez-Ruiz
A Vellido
A Vellido
A Vilamala
AK Jain
Alfredo Vellido
AP Candiota
AR Tate
AR Tate
C Ding
C Jutten
Carles Arús
Daniel Monleon
DD Lee
DN Louis
DW Ellison
Enrique Romero
FA Howe
G Fan
H Ishimaru
H Ohgaki
H Ohgaki
HAL Kiers
Héctor Ruiz
I Barba
Ian H. Jarman
Iván Olier
JF Cardoso
JM García-Gómez
JM Kros
José D. Martín
JW Sammon
KS Opstad
L Lukas
LM DeAngelis
M Esposito
M Julià-Sapé
M Julià-Sapé
M Julià-Sapé
M Julià-Sapé
M Law
M Murphy
Margarida Julià-Sapé
MC Martínez-Bisbal
MG Kounelakis
P Paatero
P Sajda
Paulo J. G. Lisboa
PJG Lisboa
S Amari
S Herminghaus
S Zafeiriou
Sandra Ortega-Martorell
SW Coons
X Castells
Y Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

BackgroundThe clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyzes single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of tumor type classification from the spectroscopic signal.Methodology/Principal FindingsNon-negative matrix factorization techniques have recently shown their potential for the identification of meaningful sources from brain tissue spectroscopy data. In this study, we use a convex variant of these methods that is capable of handling negatively-valued data and generating sources that can be interpreted as tumor class prototypes. A novel approach to convex non-negative matrix factorization is proposed, in which prior knowledge about class information is utilized in model optimization. Class-specific information is integrated into this semi-supervised process by setting the metric of a latent variable space where the matrix factorization is carried out. The reported experimental study comprises 196 cases from different tumor types drawn from two international, multi-center databases. The results indicate that the proposed approach outperforms a purely unsupervised process by achieving near perfect correlation of the extracted sources with the mean spectra of the tumor types. It also improves tissue type classification.Conclusions/SignificanceWe show that source extraction by unsupervised matrix factorization benefits from the integration of the available class information, so operating in a semi-supervised learning manner, for discriminative source identification and brain tumor labeling from single-voxel spectroscopy data. We are confident that the proposed methodology has wider applicability for biomedical signal processing

Keele Research Repository

Public Library of Science (PLOS)

LJMU Research Online (Liverpool John Moores University)

E-space: Manchester Metropolitan University's Research Repository

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura