68 research outputs found
Sequence count data are poorly fit by the negative binomial distribution
Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that non-parametric tests should be preferred over parametric methods
A unified framework for unconstrained and constrained ordination of microbiome read count data
Explorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method is able to cope with differences in dispersion between taxa and varying sequencing depths, to yield meaningful biological patterns. Moreover, it can correct for observed technical confounders, whereas other methods are adversely affected by these artefacts. Unlike distance-based ordination methods, the assumptions underlying our method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-package RCM
Model-based joint visualization of multiple compositional omics datasets
The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi
ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering
Background: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses.
Results: Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step.
Conclusions: ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection
Host and environmental predictors of exhaled breath temperature in the elderly
BACKGROUND: Exhaled breath temperature has been suggested as a new method to detect and monitor pathological processes in the respiratory system. The putative mechanism of this approach is based upon changes in the blood flow. So far potential factors that influence breath temperature have not been studied in the general population. METHODS: The exhaled breath temperature was measured in 151 healthy non-smoking elderly (aged: 60–80 years) at room temperature with the X-halo device with an accuracy of 0.03°C. We related exhaled breath temperature by use of regression models with potential predictors including: host factors (sex, age) and environmental factors (BMI, physical activity, and traffic indicators). RESULTS: Exhaled breath temperature was lower in women than in men and was inversely associated with age, physical activity. BMI and daily average ambient temperature were positively associated with exhaled breath temperature. Independent of the aforementioned covariates, exhaled breath temperature was significantly associated with several traffic indicators. Residential proximity to major road was inversely associated with exhaled breath temperature: doubling the distance to the nearest major intense road was observed a decrease of 0.17°C (95% CI: -0.33 to -0.01; p = 0.036). CONCLUSIONS: Exhaled breath temperature has been suggested as a noninvasive method for the evaluation of airway inflammation. We provide evidence that several factors known to be involved in proinflammatory conditions including BMI, physical activity and residential proximity to traffic affect exhaled breath temperature. In addition, we identified potential confounders that should be taken into account in clinical and epidemiological studies on exhaled breath temperature including sex, age, and ambient temperature
Recommended from our members
Intergenerational transfer of antibiotic-perturbed microbiota enhances colitis in susceptible mice
Antibiotic exposure in children has been associated with the risk of Inflammatory Bowel Disease (IBD). Since antibiotic use in children or in their pregnant mother can affect how the intestinal microbiome develops, we asked whether the transfer of an antibiotic-perturbed microbiota from mothers to their children could affect their risk of developing IBD. Here we demonstrate that germ-free adult pregnant mice inoculated with a gut microbial community shaped by antibiotic exposure transmitted their perturbed microbiota to their offspring with high fidelity. Without any direct or continued exposure to antibiotics, this dysbiotic microbiota in the offspring remained distinct from controls for at least 21 weeks. By using both IL-10-deficient and wild type mothers, we showed that both inoculum and genotype shape the microbiota populations in the offspring. Since IL10−/− mice are genetically susceptible to colitis, we could assess the risk due to maternal transmission of an antibiotic-perturbed microbiota. We found that the IL10−/− offspring that had received the perturbed gut microbiota developed markedly increased colitis. Taken together, our findings indicate that antibiotic exposure shaping the maternal gut microbiota has effects that extend to their offspring with both ecological and long-term disease consequences
Response to comments on Jaki et al., A proposal for a new PhD level curriculum on quantitative methods for drug development. Pharm Stat 17(5):593-606, Sep/Oct 2018., DOI https://doi.org/10.1002/pst.1873
FABIA: factor analysis for bicluster acquisition
Motivation: Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called ‘FABIA: Factor Analysis for Bicluster Acquisition’. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data. The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques
A unified framework for unconstrained and constrained ordination of microbiome read count data
Explorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method is able to cope with differences in dispersion between taxa and varying sequencing depths, to yield meaningful biological patterns. Moreover, it can correct for observed technical confounders, whereas other methods are adversely affected by these artefacts. Unlike distance-based ordination methods, the assumptions underlying our method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-package RCM
Overzicht van het gehandicaptenbeleid in België naar leefsferen
nrpages: 180status: publishe
- …