50 research outputs found

    Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data

    No full text
    <div><p>Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.</p></div

    The type I error rate estimations.

    No full text
    <p>The type I error rate estimations.</p

    The power of test for ZINB simulated data.

    No full text
    <p>The <i>X</i> axis is the value of the covariate effect on the count data <i>γ</i><sub>1</sub> and the <i>Y</i> axis is the power of test when the level of significance is 0.05. Three different cases of covariate effect, i.e., the consonant (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub> − 5%), neutral (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub>) and dissonant (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub> + 5%) effect, are presented in panels <b>(A)</b> and <b>(B)</b>; <b>(C)</b> and <b>(D)</b>; and <b>(E)</b> and <b>(F)</b>, respectively. Each column reflects different proportion of zero inflation in the unexposed group: 20% in <b>(A)</b>, <b>(C)</b> and <b>(E)</b>; and 50% in <b>(B)</b>, <b>(D)</b> and <b>(F)</b> from the left to the right column, respectively.</p

    The estimate of <i>β</i><sub>1</sub> (or β˜1) and its standard error for data simulated under ZINB when <b><i>ϕ</i></b><sub><b>c</b></sub> = <b>20%</b> and <b>γ</b><sub><b>1</b></sub> = <b>0.4</b>.

    No full text
    <p>The figure displays box-plots of estimates and their standard errors for the covariate effect on the log-odds of structural zeros for ZIP and ZINB method and on the log-odds of zeros for hurdle models from 1000 replications when <i>γ</i><sub>1</sub> = 0.4. For each box of the boxplots, the center line represents the median, the bottom line represents the 25th percentiles and the top line represents the 75th percentiles. The whiskers of the boxplots show 1.5 interquartile range (IQR) below the 25th percentiles and 1.5 IQR above the 75th percentiles, and outliers are represented by small circles. Panels <b>(A1)</b>, <b>(C1)</b> and <b>(E1)</b> show the estimates of <i>β</i><sub>1</sub> for consonant, neutral and dissonant effect case, respectively. The horizontal line in these panels represents the true value of <i>β</i><sub>1</sub>, which is −0.349 in <b>(A1)</b>, 0 in <b>(C1)</b> and 0.287 in <b>(E1)</b>. Panels <b>(A2)</b>, <b>(C2)</b> and <b>(E2)</b> show the estimates of <math><mrow><msub><mi>β</mi><mo>˜</mo><mn>1</mn></msub></mrow></math> for consonant, neutral and dissonant effect case, respectively. The horizontal line in these panels represents the true value of <math><mrow><msub><mi>β</mi><mo>˜</mo><mn>1</mn></msub></mrow></math>, which is −0.420 in <b>(A2)</b>, −0.240 in <b>(C2)</b> and −0.070 in <b>(E2)</b>. The bias, standard deviation (SD), and root mean square error (RMSE) of the estimates are shown above the box-plot for each method. Panel <b>(B1)</b>, <b>(D1)</b> and <b>(F1)</b> show the SEs of the estimates for <i>β</i><sub>1</sub>, and panel <b>(B2)</b>, <b>(D2)</b> and <b>(F2)</b> show the SEs of the estimates for <math><mrow><msub><mi>β</mi><mo>˜</mo><mn>1</mn></msub></mrow></math>. The mean and standard deviation (SD) of the standard error (SE) estimations are shown above the box-plot for each method.</p

    The empirical probability of choosing a model using AIC criterion for ZIP distributed data.

    No full text
    <p>The <i>X</i> axis is the value of the covariate effect on the count data <i>γ</i><sub>1</sub> and the <i>Y</i> axis is the empirical probability of choosing a model using AIC criterion. Three different cases of covariate effect, i.e., the consonant (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub> − 5%), neutral (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub>) and dissonant (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub> + 5%) effect, are presented in <b>(A)</b>, <b>(B)</b> and <b>(C)</b>; <b>(D)</b>, <b>(E)</b> and <b>(F)</b>; and <b>(G)</b>, <b>(H)</b> and <b>(I)</b>, respectively. Each column reflects different proportion of zero inflation in the unexposed group: 20% in <b>(A)</b>, <b>(D)</b> and <b>(G)</b>; 50% in <b>(B)</b>, <b>(E)</b> and <b>(H)</b>; and 80% in <b>(C)</b>, <b>(F)</b> and <b>(I)</b> from the first to the third column.</p

    The estimate of <i>γ</i><sub>1</sub> and its standard error for data simulated under ZINB with <b><i>ϕ</i></b><sub><b>c</b></sub> = <b>20%</b>.

    No full text
    <p>The figure displays box-plots of estimates and their standard errors for <i>γ</i><sub>1</sub> from 1000 replications in <b>(A)</b> and <b>(B)</b>; <b>(C)</b> and <b>(D)</b>; and <b>(E)</b> and <b>(F)</b> for the consonant (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub> − 5%), neutral (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub>) and dissonant (<i>ϕ</i><sub><i>t</i></sub> = <i>ϕ</i><sub><i>c</i></sub> + 5%) effect case, respectively. For each box of the boxplots, the center line represents the median, the bottom line represents the 25th percentiles and the top line represents the 75th percentiles. The whiskers of the boxplots show 1.5 interquartile range (IQR) below the 25th percentiles and 1.5 IQR above the 75th percentiles, and outliers are represented by small circles. The horizontal line in <b>(A)</b>, <b>(C)</b> and <b>(E)</b> represents the true value of <i>γ</i><sub>1</sub> (= 0.4) and the bias, standard deviation (SD), and root mean square error (RMSE) of the estimations of <i>γ</i><sub>1</sub> are shown above its box-plot for each method. The mean and standard deviation (SD) of the standard error (SE) estimations are shown above the box-plot for each method in panels <b>(B)</b>, <b>(D)</b> and <b>(F)</b>.</p

    The flowchart for microbiome real data analysis.

    No full text
    <p>The flowchart for microbiome real data analysis.</p

    The comparison plots of the observed and expected counts of bacteria for Campylobacter, Anaerotruncus and Dehalobacterium for females and males using the best three models judging by AIC criterion.

    No full text
    <p>The <i>X</i> axis is the possible values of the OTUs, the bars are the observed counts, the red line connects the expected counts produced by the model with smallest AIC values, the green line connects the expected counts produced by the model with the second smallest AIC values and the blue line connects the expected counts produced by the model with the third smallest AIC values. The first, second and third row of the plots are for bacteria Campylobacter, Anaerotruncus, and Dehalobacterium, respectively.</p

    The simulation scenario.

    No full text
    <p><i>γ</i><sub>0</sub> = 1 for all simulation scenarios. The over-dispersion parameter <i>κ</i> is set to be 1 for all ZINB simulation scenarios. <i>β</i><sub>0</sub> reflects the log odds of zero inflation in the unexposed group, and is equal to {−1.386, 0, 1.386} for the {20%, 50%, 80%} of zero inflation in this group. <i>β</i><sub>1</sub> reflects the change in log odds of zero inflation when changing from unexposed to exposed group. The corresponding values of <i>β</i><sub>1</sub> of {−5%, 0, +5%} changing in the zero inflation are {−0.349, 0, 0.287}, {0.201, 0, 0.201}, and {−0.287, 0, 0.349} for 20%, 50% and 80% of the zero inflations in the unexposed group, repsectively.</p
    corecore