6 research outputs found

    GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies

    No full text
    <div><p>Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: <a href="https://github.com/WandeRum/GSimp" target="_blank">https://github.com/WandeRum/GSimp</a>.</p></div

    Evaluations of different imputation methods using labeled approaches.

    No full text
    <p>Pearson's correlation between log-transformed p-values of student’s t-tests on FFA dataset (upper left) and BA dataset (upper right) along with different numbers of missing variables based on four imputation methods: HM (red circle), QRILC (green triangle), GSimp (blue square), and kNN-TN (purple cross). PLS-Procrustes sum of squared errors on FFA dataset (lower left) and BA dataset (lower right) along with different numbers of missing variables based on four imputation methods: HM (red circle), QRILC (green triangle), GSimp (blue square), and kNN-TN (purple cross).</p

    Evaluations of different imputation methods using labeled approaches.

    No full text
    <p>Pearson's correlation between log-transformed p-values of student’s t-tests on FFA dataset (upper left) and BA dataset (upper right) along with different numbers of missing variables based on four imputation methods: HM (red circle), QRILC (green triangle), GSimp (blue square), and kNN-TN (purple cross). PLS-Procrustes sum of squared errors on FFA dataset (lower left) and BA dataset (lower right) along with different numbers of missing variables based on four imputation methods: HM (red circle), QRILC (green triangle), GSimp (blue square), and kNN-TN (purple cross).</p

    Comparisons of imputed values and original values on one variable.

    No full text
    <p>Scatter plots of imputed values (X-axis) and original values (Y-axis) on one example missing variable while non-missing elements represented as blue dots and missing elements as red dots based on four imputation methods: HM (upper left), QRILC (upper right), kNN-TN (lower left), and GSimp (lower right). Rug plots show the distributions of imputed values and original values.</p

    Sequentially parameters updating in GSimp.

    No full text
    <p>The first 500 iterations out of a total of 2000 (100×20) iterations using GSimp where <i>ŷ</i>, <i>ỹ</i> and <i>σ</i> represent fitted value, sample value and standard deviation correspondingly.</p

    Evaluations of different imputation methods using TPR for various <i>p</i>-cutoffs on simulation dataset.

    No full text
    <p><i>TPR</i> along with different numbers of missing variables based on three imputation methods: QRILC (green triangle), GSimp (blue square), and kNN-TN (purple cross) among different p-cutoff = 0.05 (left panel), and 0.01 (right panel).</p
    corecore