204 research outputs found
Normalization can reduce false discovery and improve the power of association analysis.
In (a), the samples are randomly divided into two groups, and the count data in the first group is rarefied. In (b), the synthetic data include differential abundant taxa. The significance level is 0.05 in both (a) and (b). Normalization is an essential step to avoid false discovery and improve power.</p
Comparisons of normalization methods in estimating sampling fraction.
The numerical experiments are performed when the signal strength of differential abundant taxa is (a) weak, (b) moderate, and (c) strong. In (a), (b), and (c), the x-axis represents true sampling fractions, while the y-axis represents the estimated sampling fraction from normalization methods. We scale the estimated sampling fractions so that their average is the same as the average of true sampling fractions. The black line in these figures represents equality between the estimated and true sampling fractions and the color of points represent which group the differential abundant taxa belong to. The bias in sampling fraction estimation by different normalization methods is compared in (d) when the signal strength and proportion (p = 0.1, 0.2, 0.3) of differential abundant taxa vary. It is clear that the reference-based method can better correct the compositional bias than existing methods, especially when there is a large proportion of strong differential abundant taxa.</p
False pattern caused by compositional bias leads to a misleading conclusion.
(a) shows the PCoA plots colored by days after the experiment started. (b) presents the PCoA plots colored by sequencing depth. (c) show the relationship between time and sequencing depth. The pattern of time in PCoA plots is highly overlapped with pattern of the sequencing depth, which can be explained by the deterministic relationship between time and sequencing depth. (PNG)</p
Compositional bias can create false clusters in PCoA plots.
In (a) and (b), samples are randomly divided into two groups. No modification is applied to (a), while the count data in group 1 is rarefied in (b). In (c), samples are divided into two groups based on the sequencing depth (>10000 belongs to the first group, and <5000 belongs to the second group). In these figures, RSim normalization can help remove the false clusters resulting from compositional bias. Euclidean distance with log transformation is used in all PCoA plots.</p
RSim normalization helps two-sample <i>t</i>-test control false discovery.
Samples are divided into two groups based on the sequencing depth (20000 belongs to the second group), and the FDR is shown when the different significance levels are used. In (a), seven normalization methods are compared. In (b), a two-sample t-test equipped with RSim normalization is compared with state-of-art differential abundance tests.</p
Evolving Accumulation of a Complex Profile of Polychlorinated Alkanes in Canadian Polar Bears
Approximately 33 million t of polychlorinated alkanes
(PCAs), also
known as chlorinated paraffins, has been globally produced and used.
Despite the higher bioaccumulation potential of PCAs in terrestrial
ecosystems than in marine ecosystems, North American terrestrial PCA
data are sparse and Arctic studies largely focus on short-chain PCAs,
with minimal attention to longer-chain homologues in wildlife. This
research delves into the dynamics of PCA accumulation and temporal
changes across a broad spectrum of PCA homologues in polar bears (Ursus maritimus) from Hudson Bay. Subcutaneous fat samples
collected over the past decade from adult male polar bears of the
Western Hudson Bay (WHB) and Southern Hudson Bay (SHB) subpopulations
were analyzed, identifying 109 of 545 PCA homologues, ranging from
C8 to C26. Analysis of 37 dietary fatty acids
provided insights into dietary shifts and their influence on PCA profiles.
Notably, SHB bears exhibited a decrease in PCA concentrations, reflecting
marine food web influences. In contrast, WHB bears displayed increasing
PCA levels, likely due to the use of more terrestrial and anthropogenic
food sources. This study underscores the critical yet overlooked role
of longer-chain PCAs in the Arctic food web and polar bear exposure,
emphasizing the variance between subpopulations and the significant
impact of dietary factors
Differential abundant phyla detected by different differential abundance analysis methods.
Three methods are considered: t-test on unnormalized data, t-test on data normalized by RSim, and RDB test on unnormalized data. (PDF)</p
Comparison of different normalization methods’ effect on the differential abundance analysis.
(a) and (b) are the FDR and sensitivity plots of the t-test after applying seven normalization methods. (c) and (d) are the FDR and sensitivity plots of the Pearson correlation test after applying seven normalization methods. The x-axis is the signal strength of differential abundant taxa. RSim can help t-test and Pearson correlation test control FDR and maintain detection power.</p
Illustration demonstrating the procedure of RSim normalization.
Step 1: median of pairwise rank similarity of taxa is evaluated to construct the statistics for the differential abundance level of each taxon. Step 2: a new empirical Bayes method provides misclassification rate control in identifying non-differential abundant taxa. Estimated non-differential abundant taxa are used as the reference set in reference-based normalization.</p
Misclassification rate control when the target misclassification rate <i>η</i> and parameter <i>γ</i> vary.
In Fig (a), the x-axis is the target misclassification rate, while the y-axis represents the empirical misclassification rate of the estimated reference set. In all settings, the misclassification error rate of the estimated reference set can be well controlled. In Fig (b), we vary the value of γ from 0.5 to 0.95. All three settings are the same for both figures. Setting 1: 10% taxa are randomly selected as differential abundant taxa, and the latent variable of differential abundant taxa is binary; Setting 2: the differential abundant taxa are top 10% most abundant taxa, and the latent variable of differential abundant taxa is binary; Setting 3: the differential abundant taxa are top 10% most abundant taxa, and the latent variable of differential abundant taxa is continuous. (PNG)</p
- …
