43 research outputs found

    Evaluation of the effect of chance correlations on variable selection using Partial Least Squares -Discriminant Analysis

    Full text link
    Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than that attributed to the presence of chance correlations in the original data set. Statistical significance of PLSDA CV-figures of merit obtained after variable selection is expressed by means of p-values calculated by using a permutation test that included the variable selection step. The reliability of the approach is evaluated using two variable selection methods on experimental and simulated data sets with and without induced class differences. The proposed approach can be considered as a useful tool when no external validation set is available and provides a straightforward way to evaluate differences between variable selection methods.JE and JK acknowledge the "Sara Borrell" Grants (CD11/00154 and CD12/00667) from the Instituto Carlos III (Ministry of Economy and Competitiveness). DPG acknowledge the "V Segles" Grant provided by the University of Valencia to carry out this study. MV acknowledges the FISPI11/0313 Grant from the Instituto Carlos III (Ministry of Economy and Competitiveness). AF acknowledges the DPI2011-28112-C04-02 Grant from Spanish Ministry of Science and Innovation (MICINN). GQ acknowledges the financial support from the Spanish Ministry of Economy and Competitivity (SAF2012-39948).Kuligowski, J.; Pérez Guaita, D.; Escobar, J.; Guardia, MDL.; Vento, M.; Ferrer Riquelme, AJ.; Quintás, G. (2013). Evaluation of the effect of chance correlations on variable selection using Partial Least Squares -Discriminant Analysis. Talanta. 116:835-840. https://doi.org/10.1016/j.talanta.2013.07.048S83584011

    Biomarker Discovery and Pattern Recognition with application to Metabolomics

    No full text
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Paper spray mass spectrometry as an effective tool for differentiating coffees based on their geographical origins

    No full text
    With the rising trend of valuing flavor complexity of coffees, means to distinguish the properties of individual coffee sources is vital to the sustainable growth of the coffee industry. Herein, paper spray mass spectrometry (PS–MS), a simple technique with little sample preparation, was used to collect mass data from aqueous extracts of coffees from various sources. Thereafter, principal component analysis and linear discriminant analysis were used to successfully classify coffee samples (with 80–100 % accuracy) from various studies including the differentiations of Arabica and Robusta coffees, Arabica coffees from different countries, Robusta coffees from different geographical locations, and Arabica coffees from different locations within the same province in Thailand. With further insight from significant test via Fisher weight determination, this method was proved to be practical for differentiating coffees based on types and geographical origins, thus paving the way for broader applications

    Identification of Volatile Compounds and Selection of Discriminant Markers for Elephant Dung Coffee Using Static Headspace Gas Chromatography—Mass Spectrometry and Chemometrics

    No full text
    Elephant dung coffee (Black Ivory Coffee) is a unique Thai coffee produced from Arabica coffee cherries consumed by Asian elephants and collected from their feces. In this work, elephant dung coffee and controls were analyzed using static headspace gas chromatography hyphenated with mass spectrometry (SHS GC-MS), and chemometric approaches were applied for multivariate analysis and the selection of marker compounds that are characteristic of the coffee. Seventy-eight volatile compounds belonging to 13 chemical classes were tentatively identified, including six alcohols, five aldehydes, one carboxylic acid, three esters, 17 furans, one furanone, 13 ketones, two oxazoles, four phenolic compounds, 14 pyrazines, one pyridine, eight pyrroles and three sulfur-containing compounds. Moreover, four potential discriminant markers of elephant dung coffee, including 3-methyl-1-butanol, 2-methyl-1-butanol, 2-furfurylfuran and 3-penten-2-one were established. The proposed method may be useful for elephant dung coffee authentication and quality control

    Supervised Self Organising Maps for Classification and Determination of Potentially Discriminatory Variables: Illustrated by Application to Nuclear Magnetic Resonance Metabolomic Profiling

    No full text
    The article describes the extension of the self organizing maps discrimination index (SOMDI) for cases where there are more than two classes and more than one factor that may influence the group of samples by using supervised SOMs to determine which variables and how many are responsible for the different types of separation. The methods are illustrated by an application in the area of metabolic profiling, consisting of a nuclear magnetic resonance (NMR) data set of 96 samples of human saliva, which is characterized by three factors, namely, whether the sample has been treated or not, 16 donors, and 3 sampling days, differing for each donor. The sampling days can be considered a null factor as they should have no significant influence on the metabolic profile. Methods for supervised SOMs involve including a classifier for organizing the map, and we report a method for optimizing this by using an additional weight that determines the relative importance of the classifier relative to the overall experimental data set in order to avoid overfitting. Supervised SOMs can be obtained for each of the three factors, and we develop a multiclass SOM discrimination index (SOMDI) to determine which variables (or regions of the NMR spectra) are considered significant for each of the three potential factors. By dividing the data iteratively into training and test sets 100 times, we define variables as significant for a given factor if they have a positive SOMDI in the training set for the factor and class of interest over all iterations

    Flavor Profile in Fresh-squeezed Juice of Four Thai Lime Cultivars: Identification of Compounds that Influence Fruit Selection by Master Chefs

    Get PDF
    The flavor and sensory profiles that influenced the selection of 4 commercial Thai lime cultivars (Citrus aurantifolia Swingle cv. ‘Pan Rumpai’, ‘Pan Puang’, and ‘Pan Pijit’ and Citrus latifolia Tanaka cv. ‘Tahiti’) by Thai chefs were examined. Twenty-eight volatiles (7 monoterpenes, 13 sesquiterpenes, 4 monoterpene alcohols, 1 aldehyde, 2 monoterpene aldehydes, and 1 monoterpene ester) and 9 non-volatiles (citric acid, malic acid, succinic acid, ascorbic acid, sucrose, fructose, glucose, limonin, and naringin) contributing to the flavor of Thai lime juice were identified using dynamic headspace-gas chromatography-olfactometry-mass spectrometry and high-performance liquid chromatography, respectively. An interview of master chefs and an acceptance test of culinary students revealed that Pan Puang was the most preferred lime cultivar owing to its moderate sour taste and its unique floral aroma contributed by terpinolene and linalool, along with its low content of β-myrcene, which contributes to balsamic and pungent aroma notes

    Chrial discrimination by TERS

    No full text
    Tip-enhanced Raman scattering (TERS) using a silver tip that is chemically modified by an achiral para-mercaptopyridine (pMPY) probe molecule has been utilized for chiral discrimination. Differences in the relative intensities of the pMPY bands in the TERS spectra were used to monitor three pairs of enantiomers containing hydroxy (-OH) and/or amino (-NH2) groups. The ND or N+-H functionality of the pMPY-modified tip is concerned with hydrogen-bond interactions with a particular molecular orientation of each chiral isomer. The asymmetric arrangement of silver atoms at the apex of the tip causes an asymmetric electric field, which makes the tip a chiral center. Variations in the charge-transfer (CT) states of the metal-achiral probe system in conjunction with the asymmetric electric field cause different enhancements in the Raman signals of the two enantiomers. The near-field effect of the asymmetric electric field induces further chiral discrimination.Published versio

    Self organising maps for variable selection: application to human saliva analysed by nuclear magnetic resonance spectroscopy to investigate the effect of an oral healthcare product

    No full text
    SOMs (Self Organising Maps) are derived from the machine learning literature and serve as a valuable method for representing data. In this paper, the use of SOMs as a technique for determining the most significant variables (or markers) in a dataset is described. The method is applied to the NMR spectra of 96 human saliva samples, half of which have been treated with an oral rinse formulation and half of which are controls, and 49 variables consisting of bucketed intensities. In addition, three simulations, two of which consist of the same number of samples and variables as the experimental dataset and a third that contains a much larger number of variables, are described. Two of the simulations contain known discriminatory variables, and the remaining is treated as a null dataset without any specific discriminatory variables added. The described SOM method is contrasted to Partial Least Squares Discriminant Analysis, and a list of the markers determined to be most significant using both approaches was obtained and the differences arising are discussed. A SOM Discrimination Index (SOMDI) is defined, whose magnitude relates to how strongly a variable is considered to be a discriminator. In order to ensure that the model is stable and not dependent on the random starting point of the SOM, one hundred iterations were performed and variables that were consistently of high rank were selected. A variety of approaches for data representation are illustrated, and the main theoretical principles of employing SOMs for determining which variables are most significant are outlined. Software used in this paper was written in-house, allowing greater flexibility over existing packages, and tailored for the specific application in hand
    corecore