Search CORE

122 research outputs found

Evaluation and improvement of data quality and methods in structural genomics

Author: Saccenti Edoardo
Publication venue
Publication date: 01/01/2007
Field of study

Approaches to Sample Size Determination for Multivariate Data:Applications to PCA and PLS-DA of Omics Data

Author: Saccenti Edoardo
Timmerman Marieke E.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2016
Field of study

Sample size determination is a fundamental step in the design of experiments. Methods for sample size determination are abundant for univariate analysis methods, but scarce in the multivariate case. Omics data are multivariate in nature and are commonly investigated using multivariate statistical methods, such as principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA). No simple approaches to sample size determination exist for PCA and PLS-DA. In this paper we will introduce important concepts and offer strategies for (minimally) required sample size estimation when planning experiments to be analyzed using PCA and/or PLS-DA.</p

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Wageningen University & Research Publications

Dissertations of the University of Groningen

Considering Horn’s parallel analysis from a random matrix theory point of view

Author: Saccenti Edoardo
Timmerman Marieke E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Horn’s parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy–Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy–Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy–Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy–Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.<p>Horn’s parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy–Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy–Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy–Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy–Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.</p

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Wageningen University & Research Publications

Dissertations of the University of Groningen

Age- and Sex-Dependent Changes of Free Circulating Blood Metabolite and Lipid Abundances, Correlations, and Ratios

Author: Di Cesare Francesca
Edoardo Saccenti
Luchinat Claudio
Tenori Leonardo
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

Florence Research

Group-wise Partial Least Square Regression

Author: Camacho Páez José
Saccenti Edoardo
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

This paper introduces the Group-wise Partial Least Squares (GPLS) regression. GPLS is a new Sparse PLS (SPLS) technique where the sparsity structure is de ned in terms of groups of correlated variables, similarly to what is done in the related Group-wise Principal Component Analysis (GPCA). These groups are found in correlation maps derived from the data to be analyzed. GPLS is especially useful for exploratory data analysis, since suitable values for its metaparameters can be inferred upon visualization of the correlation maps. Following this approach, we show GPLS solves an inherent problem of SPLS: its tendency to confound the data structure as a result of setting its metaparameters using standard approaches for optimizing prediction, like cross-validation. Results are shown for both simulated and experimental data

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositorio Institucional Universidad de Granada

Wageningen University & Research Publications

On the use of the observation-wise k-fold operation in PCA cross-validation

Author: Camacho Páez José
Saccenti Edoardo
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

Cross-validation (CV) is a common approach for determining the optimal number of components in a principal component analysis model. To guarantee the independence between model testing and calibration, the observation-wise k-fold operation is commonly implemented in each cross-validation step. This operation renders the CV algorithm computationally intensive and it is the main limitation to apply CV on very large data sets. In this paper we carry out an empirical and theoretical investigation of the use of this operation in the element wise k-fold (ekf ) algorithm, the state-of-the-art CV algorithm. We show that when very large data sets need to be cross-validated and the computational time is a matter of concern, the observation-wise k-fold operation can be skipped. The theoretical properties of the resulting modi ed algorithm, referred to as column wise k-fold (ckf ) algorithm, are derived. Also, its performance is evaluated with several arti cial and real data sets. We suggest the ckf algorithm to be a valid alternative to the standard ekf to reduce the computational time needed to cross-validate a data set

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada

Wageningen University & Research Publications

Age and Sex Effects on Plasma Metabolite Association Networks in Healthy Subjects

Author: Luchinat Claudio
Saccenti Edoardo
Tenori Leonardo
Vignoli Alessia
Publication venue
Publication date: 01/01/2017
Field of study

In the era of precision medicine, the analysis of simple information like sex and age can increase the potential to better diagnose and treat conditions that occur more frequently in one of the two sexes, present sex-specific symptoms and outcomes, or are characteristic of a specific age group. We present here a study of the association networks constructed from an array of 22 plasma metabolites measured on a cohort of 844 healthy blood donors. Through differential network analysis we show that specific association networks can be associated with sex and age: Different connectivity patterns were observed, suggesting sex-related variability in several metabolic pathways (branched-chain amino acids, ketone bodies, and propanoate metabolism). Reduction in metabolite hub connectivity was also found to be associated with age in both sex groups. Network analysis was complemented with standard univariate and multivariate statistical analysis that revealed age- and sex-specific metabolic signatures. Our results demonstrate that the characterization of metabolite-metabolite association networks is a promising and powerful tool to investigate the human phenotype at a molecular level

Florence Research

Wageningen University & Research Publications

FigShare

Comparative transcriptomics reveal developmental turning points during embryogenesis of a hemimetabolous insect, the damselfly Ischnura elegans

Author: Amato George
Brugler Mercer R
DeSalle Rob
Hadrys Heike
Saccenti Edoardo
Sagasser Sven
Schranz M. Eric
Simon Sabrina
Publication venue: CUNY Academic Works
Publication date: 01/01/2017
Field of study

Identifying transcriptional changes during embryogenesis is of crucial importance for unravelling evolutionary, molecular and cellular mechanisms that underpin patterning and morphogenesis. However, comparative studies focusing on early/embryonic stages during insect development are limited to a few taxa. Drosophila melanogaster is the paradigm for insect development, whereas comparative transcriptomic studies of embryonic stages of hemimetabolous insects are completely lacking. We reconstructed the first comparative transcriptome covering the daily embryonic developmental progression of the blue-tailed damselfly Ischnura elegans (Odonata), an ancient hemimetabolous representative. We identified a “core” set of 6,794 transcripts – shared by all embryonic stages – which are mainly involved in anatomical structure development and cellular nitrogen compound metabolic processes. We further used weighted gene co-expression network analysis to identify transcriptional changes during Odonata embryogenesis. Based on these analyses distinct clusters of transcriptional active sequences could be revealed, indicating that embryos at different development stages have their own transcriptomic profile according to the developmental events and leading to sequential reprogramming of metabolic and developmental genes. Interestingly, a major change in transcriptionally active sequences is correlated with katatrepsis (revolution) during mid-embryogenesis, a 180° rotation of the embryo within the egg and specific to hemimetabolous insects

City University of New York

Crossref

Directory of Open Access Journals

Comparative Transcriptomics Reveal Developmental Turning Points during Embryogenesis of a Hemimetabolous Insect, the Damselfly Ischnura elegans

Author: Amato George
Brugler Mercer R
DeSalle Rob
Hadrys Heike
Saccenti Edoardo
Sagasser Sven
Schranz M. Eric
Simon Sabrina
Publication venue: CUNY Academic Works
Publication date: 19/10/2017
Field of study

City University of New York

Semi-supervised Multivariate Statistical Network Monitoring for Learning Security Threats

Author: Camacho Páez José
Fuentes García Noemí Marta
Macía Fernández Gabriel
Saccenti Edoardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper presents a semi-supervised approach for intrusion detection. The method extends the unsupervised Multivariate Statistical Network Monitoring approach based on Principal Component Analysis by introducing a supervised optimization technique to learn the optimum scaling in the input data. It inherits the advantages of the unsupervised strategy, capable of uncovering new threats, with that of supervised strategies, able of learning the pattern of a targeted threat. The supervised learning is based on an extension of the gradient descent method based on Partial Least Squares (PLS). Moreover, we enhance this method by using sparse PLS variants. The practical application of the system is demonstrated on a recently published real case study, showing relevant improvements in detection performance and in the interpretation of the attacks

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada

Wageningen University & Research Publications