26 research outputs found

    Sequence count data are poorly fit by the negative binomial distribution

    Get PDF
    Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that non-parametric tests should be preferred over parametric methods

    A unified framework for unconstrained and constrained ordination of microbiome read count data

    Get PDF
    Explorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method is able to cope with differences in dispersion between taxa and varying sequencing depths, to yield meaningful biological patterns. Moreover, it can correct for observed technical confounders, whereas other methods are adversely affected by these artefacts. Unlike distance-based ordination methods, the assumptions underlying our method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-package RCM

    Model-based joint visualization of multiple compositional omics datasets

    Get PDF
    The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi

    A unified framework for unconstrained and constrained ordination of microbiome read count data

    Get PDF
    Explorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method is able to cope with differences in dispersion between taxa and varying sequencing depths, to yield meaningful biological patterns. Moreover, it can correct for observed technical confounders, whereas other methods are adversely affected by these artefacts. Unlike distance-based ordination methods, the assumptions underlying our method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-package RCM

    The impact of electricity prices on jobs and investment in the Belgian manufacturing industry

    No full text
    Belgium is losing manufacturing jobs and it is losing these jobs at a faster pace compared to most other European countries. Whilst the impact of labour costs on the competitiveness of our industry is much debated and documented, the impact of the price of electricity remains unquantified. Using data of 10 European, highly industrialised countries, we estimate the impact of electricity prices on jobs and investment in Belgian manufacturing. We estimate that the elasticity of employment with respect to the electricity price is on average -0.30 and the elasticity of investment equals on average - 0.55. This means that a drop in the price of electricity of 1% would lead, holding all other things equal, to 0.30% extra manufacturing jobs and 0.55% extra manufacturing investment. Our findings are robust to different calculation methods. Others have estimated that electricity prices in Belgium are 10%-35% higher than in the neighbouring countries. Combining this information with the estimated elasticities, we calculate a price drop of 10% of the Belgian electricity price would lead within the manufacturing industry to an increase of 12,000 full-time jobs and an increase of €550 Million in yearly investment. These numbers are likely to be an underestimation of the impact. We take a conservative stance on the price handicap and Belgium has historically specialised in the most electricity intensive sectors. Furthermore, our approach does not quantify spillovers to other manufacturing nor services industries.status: publishe

    The impact of electricity prices on European manufacturing jobs

    Get PDF
    Increased investment in clean electricity in combination with a rising cost of carbon will most likely lead to higher electricity prices. We examine the impact from changing electricity prices on European manufacturing employment and find a negative elasticity for the most electricity-intensive sectors. Since these sectors are unevenly spread across countries and regions, the negative employment impact from increasing electricity prices will also be unevenly spread. Policymakers should be well aware of this and take mitigating actions to ensure a positive public sentiment towards environment-related price increases. (JEL J23, H23, Q28, Q43

    A broken promise : microbiome differential abundance methods do not control the false discovery rate

    No full text
    High-throughput sequencing technologies allow easy characterization of the human microbiome, but the statistical methods to analyze microbiome data are still in their infancy. Differential abundance methods aim at detecting associations between the abundances of bacterial species and subject grouping factors. The results of such methods are important to identify the microbiome as a prognostic or diagnostic biomarker or to demonstrate efficacy of prodrug or antibiotic drugs. Because of a lack of benchmarking studies in the microbiome field, no consensus exists on the performance of the statistical methods. We have compared a large number of popular methods through extensive parametric and nonparametric simulation as well as real data shuffling algorithms. The results are consistent over the different approaches and all point to an alarming excess of false discoveries. This raises great doubts about the reliability of discoveries in past studies and imperils reproducibility of microbiome experiments. To further improve method benchmarking, we introduce a new simulation tool that allows to generate correlated count data following any univariate count distribution; the correlation structure may be inferred from real data. Most simulation studies discard the correlation between species, but our results indicate that this correlation can negatively affect the performance of statistical methods

    Analyse van parallelle programmatuur: een parallelle implementatatie

    No full text
    KULeuven Campusbibliotheek Exacte Wetenschappen / UCL - Université Catholique de LouvainSIGLEBEBelgiu
    corecore