Search CORE

183 research outputs found

Regularized-Generalized PLS-DA

Author: Amenta Pietro
Publication venue: Università del Salento
Publication date: 27/10/2008
Field of study

EnLinear Discriminant Analysis leads to unstable models and poor predictions in the presence of quasi collinearity among variables or in situations where the number of variables is large with respect to the samples. Partial Least Squares Discriminant Analysis (PLS-DA) was than proposed to overcome the multicollinearity problem and defined as a straightforward extension of the PLS regression. Generalized PLS-DA (GPLS-DA) and “Between” PLS-DA (B-PLS-DA) are two suitable extension of PLS-DA. A simple regularization procedure is proposed to cope with the problems of quasi collinearity or multicollinearity. It is shown that the GPLS-DA and Between PLS-DA are the two end points of a continuum approach

Università del Salento: ESE - Salento University Publishing

PLS dimension reduction for classification of microarray data

Author: Boulesteix Anne-Laure
Publication venue
Publication date: 01/01/2004
Field of study

PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets

CiteSeerX

Open Access LMU

Recommended from our members

Multimodal MRI-based Imputation of the Aβ+ in Early Mild Cognitive Impairment.

Author: Joshi Sarang
the Alzheimer’s Disease Neuroimaging Initiative
Tosun Duygu
Weiner Michael W
Publication venue: eScholarship, University of California
Publication date: 01/03/2014
Field of study

ObjectiveTo identify brain atrophy from structural-MRI and cerebral blood flow(CBF) patterns from arterial spin labeling perfusion-MRI that are best predictors of the Aβ-burden, measured as composite 18F-AV45-PET uptake, in individuals with early mild cognitive impairment(MCI). Furthermore, to assess the relative importance of imaging modalities in classification of Aβ+/Aβ- early mild cognitive impairment.MethodsSixty-seven ADNI-GO/2 participants with early-MCI were included. Voxel-wise anatomical shape variation measures were computed by estimating the initial diffeomorphic mapping momenta from an unbiased control template. CBF measures normalized to average motor cortex CBF were mapped onto the template space. Using partial least squares regression, we identified the structural and CBF signatures of Aβ after accounting for normal cofounding effects of age, sex, and education.Results18F-AV45-positive early-MCIs could be identified with 83% classification accuracy, 87% positive predictive value, and 84% negative predictive value by multidisciplinary classifiers combining demographics data, ApoE ε4-genotype, and a multimodal MRI-based Aβ score.InterpretationMultimodal-MRI can be used to predict the amyloid status of early-MCI individuals. MRI is a very attractive candidate for the identification of inexpensive and non-invasive surrogate biomarkers of Aβ deposition. Our approach is expected to have value for the identification of individuals likely to be Aβ+ in circumstances where cost or logistical problems prevent Aβ detection using cerebrospinal fluid analysis or Aβ-PET. This can also be used in clinical settings and clinical trials, aiding subject recruitment and evaluation of treatment efficacy. Imputation of the Aβ-positivity status could also complement Aβ-PET by identifying individuals who would benefit the most from this assessment

eScholarship - University of California

Multivariate paired data analysis: multilevel PLSDA versus OPLSDA

Author: Age K. Smilde
E Pohjanen
EJJ Velzen van
EJJ Velzen van
Ewoud J. J. van Velzen
F Lindgren
H Nocairi
HC Bertram
Huub C. J. Hoefsloot
J Trygg
JA Westerhuis
JA Westerhuis
JJ Jansen
Johan A. Westerhuis
M Barker
M Bylesjo
RR Sokal
S Rezzi
S Smit
S Wiklund
T Skov
UG Indahl
W Wu
Publication venue: Springer US
Publication date: 01/01/2009
Field of study

Metabolomics data obtained from (human) nutritional intervention studies can have a rather complex structure that depends on the underlying experimental design. In this paper we discuss the complex structure in data caused by a cross-over designed experiment. In such a design, each subject in the study population acts as his or her own control and makes the data paired. For a single univariate response a paired t-test or repeated measures ANOVA can be used to test the differences between the paired observations. The same principle holds for multivariate data. In the current paper we compare a method that exploits the paired data structure in cross-over multivariate data (multilevel PLSDA) with a method that is often used by default but that ignores the paired structure (OPLSDA). The results from both methods have been evaluated in a small simulated example as well as in a genuine data set from a cross-over designed nutritional metabolomics study. It is shown that exploiting the paired data structure underlying the cross-over design considerably improves the power and the interpretability of the multivariate solution. Furthermore, the multilevel approach provides complementary information about (I) the diversity and abundance of the treatment effects within the different (subsets of) subjects across the study population, and (II) the intrinsic differences between these study subjects

Crossref

Springer - Publisher Connector

PubMed Central

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Improving stacking methodology for combining classifiers: applications to cosmetic industry

Author: Gomes Charles
Nocairi Hisham
Saporta Gilbert
Thomas Marie
Publication venue: HAL CCSD
Publication date: 14/10/2016
Field of study

International audienceStacking (Wolpert (1992), Breiman (1996)) is known to be a successful way of linearly combining several models. We modify the usual stacking methodology when the response is binary and predictions highly correlated,by combining predictions with PLS-Discriminant Analysis instead of ordinary least squares. For small data sets we develop a strategy based on repeated split samples in order to select relevant variables and ensure the robustness of the nal model. Five base (or level-0) classiers are combined in order to get an improved rule which is applied to a classical benchmark of UCI Machine Learning Repository. Our methodology is then applied to the prediction of dangerousness of 165 chemicals used in the cosmetic industry, described by 35 in vitro and in silico characteristics, since faced to safety constraints, one cannot rely on a single prediction method, especially when the sample sizeis low

HAL Descartes

Hal-Diderot

Online monitoring of H2S scavenging reactions in aqueous phase using Raman spectroscopy

Author: Kucheryavskiy Sergey
Maschietti Marco
Romero Logrono Iveth Alexandra
Publication venue
Publication date: 01/01/2021
Field of study

VBN

Application of Raman spectroscopy for monitoring of hydrogen sulfide scavenging reactions using biomass-based chemicals

Author: Kucheryavskiy Sergey
Maschietti Marco
Montero Fernando
Publication venue
Publication date: 01/01/2021
Field of study

VBN

Partial Least Squares and Principal Component Analysis with Non-metric Variables for Composite Indices

Author: Yoon Jisu
Publication venue
Publication date: 24/04/2015
Field of study

Ein zusammengesetzter Index ist eine aggregierte Variable, die aus individuellen Indikatoren und Gewichten besteht, wobei die Gewichte die relative Wichtigkeit jedes Indikators darstellen. Zusammengesetzte Indizes werden oft benutzt um latente Phänomene zu schreiben oder komplexe Informationen zu einer geringen Anzahl an Variablen zusammenzufassen. Es ist von großer Bedeutung richtige Gewichte für die Variablen, die einen zusammengesetzten Index bilden, zu wählen. Hauptkomponentenanalyse (PCA) ist ein populärer Ansatz um Gewichte abzuleiten, aber es ist ungeeignet, wenn informative Variationen nur kleine Varianzen der Variablen in einem zusammengesetzten Index haben. Deshalb schlägt diese Studie vor, Partial Least Squares (PLS) anzuwenden, welches die Beziehung zwischen Zielvariablen and den Variablen in einem zusammengesetzten Index ausnutzt. Unsere Simulationsstudie zeigt, dass PLS so gut wie PCA funktioniert oder erheblich es übertrifft. Zusätzlich sind in der Praxis die Variablen in einem zusammengesetzten Index häufig nicht-metrisch. Solche Variablen benötigen spezielle Verfahren, um PCA oder PLS anzuwenden. Diese Studie untersucht mehrere PCA und PLS Algorithmen für nicht-metrische Variablen in der vorliegenden Literatur und vergleicht sie durch umfangreiche Simulationsstudien, um Empfehlungen für die Praxis abzugeben. Dummy coding zeigt häufig zufriedenstellende Leistung im Vergleich zu komplizierteren Methoden. Als unsere Anwendungen betrachten wir Vermögen, Globalisierung, Geschlechtergleichheit und Korruption, indem PCA- und PLS-basierte zusammengesetzte Indizes angewendet werden. PLS erzeugt für die jeweiligen Zielvariablen massgeschnittene zusammengesetzte Indizes, die häufig bessere Leistung als PCA zeigten. Ein Vergleich zwischen PCA und PLS Gewichten und Koeffizienten zeigt, welche Variablen für die jeweiligen Zielvariablen besonders relevant sind

Georg-August-University Göttingen

Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques

Author: A Golbraikh
A Kamb
A Linusson
A Navia-Vázquez
B Ustün
CC Chang
D Aha
E Freyhult
G Cruciani
G Manning
G Scapin
H Daub
H Drucker
HM Berman
I Dubchak
IH Witten
J Trygg
Jarl ES Wikberg
JD Griffin
JE Wikberg
JE Wikberg
K Illergård
KC Chou
KC Chou
LH Alifrangis
M Bhasin
M Bhasin
M Bhasin
M Lapinsh
M Reczko
M Sandberg
M Van Heel
MA Fabian
MA Larkin
Maris Lapins
MS Cohen
MW Karaman
NP Shah
O Devos
P Bamborough
P Geladi
QB Gao
RJ Quinlan
S Hua
S Madhusudan
S Wold
S Wold
S Wold
SD Peterson
T Lundstedt
TA Carter
V Vapnik
ZR Li
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity. Results We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (Kd). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least- squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P2 = 0.67-0.73; for new kinases it ranged P2kin = 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P2 = 0.47, P2kin = 0.42 and AUC = 0.83. Conclusions Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multivariate Prediction Models for Bio-Analytical Data

Author: Rantalainen Mattias John
Rantalainen Mattias John
Publication venue: Biomolecular Medicine, Imperial College London
Publication date: 01/01/2008
Field of study

Quantitative bio-analytical techniques that enable parallel measurements of large numbers of biomolecules generate vast amounts of information for studying and characterising biological systems. These analytical methods are commonly referred to as omics technologies, and can be applied for measurements of e.g. mRNA transcript, protein or metabolite abundances in a biological sample. The work presented in this thesis focuses on the application of multivariate prediction models for modelling and analysis of biological data generated by omics technologies. Omics data commonly contain up to tens of thousands of variables, which are often both noisy and multicollinear. Multivariate statistical methods have previously been shown to be valuable for visualisation and predictive modelling of biological and chemical data with similar properties to omics data. In this thesis currently available multivariate modelling methods are used in new applications, and new methods are developed to address some of the specific challenges associated with modelling of biological data. Three closely related areas of multivariate modelling of biological data are described and demonstrated in this thesis. First, a multivariate projection method is used in a novel application for predictive modelling between omics data sets, demonstrating how data from two analytical sources can be integrated and modelled to- gether by exploring covariation patterns between the data sets. This approach is exemplified by modelling of data from two studies, the first containing proteomic and metabolic profiling data and the second containing transcriptomic and metabolic profiling data. Second, a method for piecewise multivariate modelling of short timeseries data is developed and demonstrated by modelling of simulated data as well as metabolic profiling data from a toxicity study, providing a new method for characterisation of multivariate bio-analytical time-series data. Third, a kernel-based method is developed and applied for non-linear multivariate prediction modelling of omics data, addressing the specific challenge of modelling non-linear variation in biological data

Spiral - Imperial College Digital Repository