122 research outputs found

    Critical evaluation of assessor difference correction approaches in sensory analysis

    Get PDF
    In sensory data analysis, assessor-dependent scaling effects may hinder the analysis of product differences. Romano et al. (2008) compared several approaches to reduce scaling differences between assessors by their ability to maximise the product effect F-values in a mixed ANOVA analysis. Their study on a sensory dataset of 14 cheese samples assessed by twelve assessors on a continuous scale showed that some of these approaches apparently improved the F-value of the product effect. However, this direct comparison is only legitimate if these F-values originate from the same null distribution. To obtain the null distributions of the different correction methods, we employed a permutation approach on the same cheese dataset also used by Romano et al. (2008) and a random noise simulation approach. Based on the empirically obtained null distributions, we calculated the corrected product effect significance to directly compare the performance of the preprocessing methods. Our results show that the null distributions of some preprocessing methods do not correspond to the expected F-distribution. In particular for the ten Berge method, the null distribution is shifted towards higher F-values. Therefore, an observed increase of the product effect F-value, as compared to the F-value on raw data, does not necessarily lead to increased product effect significance. If p-values are calculated based on such inflated F-values, significance may thus be overestimated. In contrast, calculation of p-values directly from the empirical null distributions obtained by permutation provides a common ground to properly compare method performance. Moreover, we show that differences in reproducibility between assessors, as they exist in real-world sensory datasets, may lead to overestimation of product effect significance by the mixed assessor model (MAM).publishedVersio

    Metabolic network discovery through reverse engineering of metabolome data

    Get PDF
    Reverse engineering of high-throughput omics data to infer underlying biological networks is one of the challenges in systems biology. However, applications in the field of metabolomics are rather limited. We have focused on a systematic analysis of metabolic network inference from in silico metabolome data based on statistical similarity measures. Three different data types based on biological/environmental variability around steady state were analyzed to compare the relative information content of the data types for inferring the network. Comparing the inference power of different similarity scores indicated the clear superiority of conditioning or pruning based scores as they have the ability to eliminate indirect interactions. We also show that a mathematical measure based on the Fisher information matrix gives clues on the information quality of different data types to better represent the underlying metabolic network topology. Results on several datasets of increasing complexity consistently show that metabolic variations observed at steady state, the simplest experimental analysis, are already informative to reveal the connectivity of the underlying metabolic network with a low false-positive rate when proper similarity-score approaches are employed. For experimental situations this implies that a single organism under slightly varying conditions may already generate more than enough information to rightly infer networks. Detailed examination of the strengths of interactions of the underlying metabolic networks demonstrates that the edges that cannot be captured by similarity scores mainly belong to metabolites connected with weak interaction strength

    Metabolomics variable selection and classification in the presence of observations below the detection limit using an extension of ERp

    Get PDF
    A compressed folder (XERp Software.zip) containing the Matlab scripts to perform XERp as well as an example application. (ZIP 11 kb

    Divide et impera: How disentangling common and distinctive variability in multiset data analysis can aid industrial process troubleshooting and understanding

    Full text link
    [EN] The possibility of addressing the problem of process troubleshooting and understanding by modelling common and distinctive sources of variation (factorsorcomponents) underlying two sets of measurements was explored in a real-world industrial case study. The used strategy includes a novel approach to systematically detect the number of common and distinctive components. An extension of this strategy for the analysis of a larger number of data blocks, which allows the comparison of data from multiple processing units, is also discussed.Spanish Ministry of Economy and Competitiveness, Grant/Award Number: DPI2017-82896-C2-1-RVitale, R.; Noord, OED.; Westerhuis, JA.; Smilde, AK.; Ferrer, A. (2021). Divide et impera: How disentangling common and distinctive variability in multiset data analysis can aid industrial process troubleshooting and understanding. Journal of Chemometrics. 35(2):1-12. https://doi.org/10.1002/cem.3266S11235

    Lipidomic Response to Coffee Consumption

    Get PDF
    Coffee is widely consumed and contains many bioactive compounds, any of which may impact pathways related to disease development. Our objective was to identify individual lipid changes in response to coffee drinking. We profiled the lipidome of fasting serum samples collected from a previously reported single blinded, three-stage clinical trial. Forty-seven habitual coffee consumers refrained from drinking coffee for 1 month, consumed 4 cups of coffee/day in the second month and 8 cups/day in the third month. Samples collected after each coffee stage were subject to quantitative lipidomic profiling using ion-mobility spectrometry-mass spectrometry. A total of 853 lipid species mapping to 14 lipid classes were included for univariate analysis. Three lysophosphatidylcholine (LPC) species including LPC (20:4), LPC (22:1) and LPC (22:2), significantly decreased after coffee intake (p 0.05); 58 of these decreased after coffee intake. In conclusion, coffee intake leads to lower levels of specific LPC species with potential impacts on glycerophospholipid metabolism more generally.Peer reviewe

    Variable selection and validation in multivariate modelling

    Get PDF
    Motivation Validation of variable selection and predictive performance is crucial in construction of robust multivariate models that generalize well, minimize overfitting and facilitate interpretation of results. Inappropriate variable selection leads instead to selection bias, thereby increasing the risk of model overfitting and false positive discoveries. Although several algorithms exist to identify a minimal set of most informative variables (i.e. the minimal-optimal problem), few can select all variables related to the research question (i.e. the all-relevant problem). Robust algorithms combining identification of both minimal-optimal and all-relevant variables with proper cross-validation are urgently needed. Results We developed the MUVR algorithm to improve predictive performance and minimize overfitting and false positives in multivariate analysis. In the MUVR algorithm, minimal variable selection is achieved by performing recursive variable elimination in a repeated double cross-validation (rdCV) procedure. The algorithm supports partial least squares and random forest modelling, and simultaneously identifies minimal-optimal and all-relevant variable sets for regression, classification and multilevel analyses. Using three authentic omics datasets, MUVR yielded parsimonious models with minimal overfitting and improved model performance compared with state-of-the-art rdCV. Moreover, MUVR showed advantages over other variable selection algorithms, i.e. Boruta and VSURF, including simultaneous variable selection and validation scheme and wider applicability

    Heterofusion:Fusing genomics data of different measurement scales

    Get PDF
    In systems biology, it is becoming increasingly common to measure biochemical entities at different levels of the same biological system. Hence, data fusion problems are abundant in the life sciences. With the availability of a multitude of measuring techniques, one of the central problems is the heterogeneity of the data. In this paper, we discuss a specific form of heterogeneity, namely, that of measurements obtained at different measurement scales, such as binary, ordinal, interval, and ratio‐scaled variables. Three generic fusion approaches are presented of which two are new to the systems biology community. The methods are presented, put in context, and illustrated with a real‐life genomics example
    corecore